Hierarchical Matrix Approximation with Blockwise Constrains Mario Bebendorf, Matthias Bollhöfer, Michael Bratsch
no. 494
Diese Arbeit ist mit Unterstützung des von der Deutschen Forschungsgemeinschaft getragenen Sonderforschungsbereichs 611 an der Universität Bonn entstanden und als Manuskript vervielfältigt worden. Bonn, April 2011
Hierarchical Matrix Approximation with Blockwise Constraints∗ M. Bebendorfa , M. Bollh¨oferb , and M. Bratscha a
Institute for Numerical Simulation, Rheinische Friedrich-Wilhelms-Universit¨at Bonn b Technische Universit¨at Braunschweig
March 22, 2011
A new technique is presented to preserve constraints to an hierarchical matrix approximation. This technique is used to preserve spaces from which the eigenvectors corresponding to small eigenvalues can be approximated, which guarantees spectral equivalence for approximate preconditioners.
1 Introduction Hierarchical (H-) matrices (see [10, 12]) provide a setting in which fully populated large-scale matrices can be treated with logarithmic-linear complexity. The data-sparsity is achieved by restricting each block of a suitable hierarchical partition to a matrix of bounded rank. Obviously, not all matrices can be approximated by H-matrices. However, it can be proved (see [2]) that at least those matrices which arise from finite element discretization of elliptic boundary value problems fall into this class. Although H-matrices provide replacements of the usual matrix operations such as addition, multiplication, and inversion, these operations are approximate. The reason for this is that the sum of two rank-k matrices usually exceeds rank k. In order to not increase the complexity, the sum has to be truncated to rank k. Therefore, any precision can be achieved for the price of increasing k. Despite the unavoidable H-matrix approximation error, it is sometimes necessary to preserve certain constraints to a matrix. For example, it might be helpful to ensure that some eigenvectors of the H-matrix approximation A˜ coincide with eigenvectors of the original matrix A ∈ CI×J , where I and J are two index sets, so that spectral equivalence can be guaranteed. In this paper, we will mainly focus on this application but the techniques presented can also be used in other cases, e.g. in mechanics, where rigid body motions shall remain in the null space. Also, it might be useful that the matrix preserves the constant vector. We are therefore faced with the problem that in addition to ˜ ≤ εkAk, ε > 0, kA − Ak (1) the constraints ˜ = AX AX are to be satisfied for a given matrix X ∈ CJ×` . ∗
This work was supported by DFG collaborative research center SFB 611.
1
(2)
A technique to satisfy conditions (1) and (2) is presented in [11, Sect. 6.8.1]. Its basic idea is to repair deviations from (2) resulting from H-matrix approximation by a global rank-` correction; for details see Sect. 2.1. This approach is designed to globally preserve a few vectors for the H-matrix approximation A˜ when approximating A. The technique for preserving constraints presented in this article is based on a different approach. We will preserve the respective part of the constraint on each block of the H-matrix partition by modifying the usual truncation. The constraint is kept during the whole approximation process rather than post-correcting it after the approximation has been computed. In particular, this allows to preserve constraints for high-level operations in the H-matrix arithmetic, because approximation errors occur only when truncating the sums of low-rank matrices. The blockwise preservation of constraints has the advantage that not only (2) is satisfied. Due to the hierarchical structure, a significantly larger set of vectors (what we call a cascadic set) can be preserved by the whole matrix if vectors conforming to the block’s level are preserved on each block. A major application of H-matrices is preconditioning second order elliptic boundary value problems. Since (1) guarantees spectral equivalence only for the upper part of the spectrum, whereas the relative accuracy of the smallest eigenvalues is determined by the condition number of A, the accuracy ε has to be chosen smaller the larger the number of degrees of freedom. Although ε enters the complexity of the H-matrix approximation only logarithmically, this effect increases the asymptotic complexity. To avoid it, we propose to preserve approximations of the eigenvectors corresponding to small eigenvalues. The cascadic preservation of vectors introduced in this article can be used to span a linear space from which these eigenvectors can be approximated. Since eigenvectors corresponding to small eigenvalues are smooth in the case of second order elliptic differential operators, the required cascadic basis can be expected to be small. The article is organized as follows. In Sect. 2, the required notation, basic concepts, and properties of hierarchical matrices will be introduced. Sect. 2.1 reviews the correction method from [11]. In Sect. 3, two methods for the blockwise preservation of given vectors on each block of a hierarchical partition will be presented. Their idea is to carry out the truncation perpendicular to the constraints. It will be seen that the preservation of blockwise constraints leads to the preservation of global constraints even for the H-matrix arithmetic. In Sect. 4, a more sophisticated way of preserving vectors will be presented. This particular set of constraints guarantees that a significantly larger set of vectors is preserved by the whole matrix due to its hierarchical block structure. It will be seen that without changing the overall complexity of the H-matrix approximation, a cascadic basis of dimension O(k log |J|) is preserved, where k denotes the maximum rank of each block. This property is exploited in Sect. 5 for spanning a basis from which the eigenvectors corresponding to small eigenvalues are approximated. Spectral equivalence will be proved for the corresponding Hmatrix preconditioner. In the last section, numerical experiments are presented which show the effectiveness of the presented techniques.
2 Hierarchical matrices In many applications it is necessary to efficiently treat fully populated matrices arising from the discretization of non-local operators, e.g. finite or boundary element discretizations of integral operators and inverses or the factors of the LU decomposition of FE discretizations of elliptic partial differential operators. For this purpose, Tyrtyshnikov [16] and Hackbusch et al. [10, 12] introduced the structure of mosaic skeleton matrices or hierarchical matrices (H-matrices) [11]. A similar approach, which is designed only for the fast multiplication of a matrix by a vector are the earlier developed fast summation methods tree code [1], fast multipole methods [8, 9], and panel clustering [13]. In contrast to many other existing fast methods which are based on multi-level structures, the
2
efficiency of H-matrices is due to two principles, matrix partitioning and low-rank representation. For an appropriate partition P of the set of matrix indices I × J, cluster trees TI and TJ are constructed by recursively subdividing I and J, respectively. The subdivision is done such that indices which are in some sense close to each other are grouped into the same cluster. There are three strategies which are commonly used. Two of them (bounding boxes [5] and principal component analysis [2]) use grid information, whereas the method presented in [4] is based on the matrix graph of a sparse matrix. In the following, L(TI ) will denote the set of leaves of TI . The depth level(t) denotes the distance of t ∈ TI to the root I, and L(TI ) will be used as the maximum level in TI increased by one. The block cluster tree TI×J is built by recursively subdividing I × J. Each block t × s is subdivided into the sons t0 × s0 , where t0 and s0 are taken from the lists of sons S(t) and S(s) of t and s in TI and TJ , respectively. The recursion is done for a block b := t×s until it is small enough or satisfies a so-called admissibility condition. The previous condition guarantees that the restriction Ab of A ∈ CI×J can be approximated by a matrix of low rank; see [2]. All other blocks are small enough and stored as a dense matrix. The set of leaves of the block cluster tree TI×J constitutes the partition P. The constructed partition has the property that for a given cluster t ∈ TI a constantly bounded number crsp (t) := |{s ⊂ J : t × s ∈ P}| of blocks t × s appear in P. Similarly, given s ∈ TJ , the expression ccsp (s) := |{t ⊂ I : t × s ∈ P}| is bounded by a constant. Hence, the sparsity constant csp :=
max {crsp (t), ccsp (s)}
t∈TI , s∈TJ
is bounded independently of the sizes of I and J; see [6]. The set of hierarchical matrices on the partition P and blockwise rank k is then defined as H(P, k) := {A ∈ CI×J : rank Ab ≤ k for all b ∈ P}. Elements of H(P, k) provide data-sparse representations of fully populated matrices because the elements of this set can be stored with logarithmic-linear complexity. This can be easily seen from the boundedness of csp and the inequality X |t| + |s| ≤ csp [L(TI )|I| + L(TJ )|J|]; (3) t×s∈TI×J
for further details see [2]. A major advantage of H-matrices over the above mentioned fast summation methods is that by exploiting the hierarchical structure of the partition P, an approximate algebra on H(P, k) can be defined which is based on divide-and-conquer versions of the usual block operations such as multiplication, inversion, and LU factorization. It is obvious that H(P, k) is not a linear space, because the rank of the sum of two rank-k matrices is in general bounded only by 2k. Therefore, the sum has to be truncated to rank k in order to not exceed a maximum rank. A common way (see [2]) to approximate a rank-k 0 matrix U V H by a matrix of lower rank k < k 0 is to compute a QR-decomposition of U = QU RU and V = QV RV and afterwards separate the product of 0 0 RU RVH = W ΣZ H ∈ Ck ×k , Σ := diag(σ1 , . . . , σk0 ), via singular value decomposition. This yields H U V H = QU RU RVH QH V = (QU W )Σ(QV Z) .
(4)
˜ V˜ H , where As an approximation of U V H we use the rank-k matrix U ˜ := (QU W )Σk , U
3
V˜ := QV Z,
(5)
0
0
and Σk := diag(σ1 , . . . , σk ) ∈ Rk ×k . The previous type of approximation (5) can be computed with ˜ V˜ H is a best approximation, because due to linear complexity (k 0 )2 (|t| + |s|). The rank-k matrix U Mirsky [14] one has min
|t|×|s| M ∈Ck
˜ V˜ H k2 = kΣ − Σk k2 = σk+1 , kU V H − M k2 = kU V H − U
where Cm×n := {A ∈ Cm×n : rank A ≤ k} k denotes the set of matrices of rank at most k. Hence, if a given accuracy ε > 0 is to be guaranteed, one can choose k(ε) := min{k ∈ N : σk+1 < εσ1 }.
2.1 Treating constraints For many applications it is essential to exactly preserve constraints satisfied by the matrix A. For instance it could be useful to preserve some eigenvectors to ensure spectral equivalence. In other applications the constant vector or translations and rotations (i.e. rigid body motions) shall remain in the null space. A disadvantage of the approximation (5) is that also the constraints will be disturbed by an error of order σk+1 . Hence, approximation of the blocks will generally destroy the side constraints for the whole matrix. Our aim is it to find an H-matrix approximation A˜ which satisfies (1) and (2). A method to preserve given vectors was presented in [11, Sect. 6.8.1]. This approach first computes an H-matrix approximation Aˆ of A via the usual truncation (5) and afterwards applies a rank-` correction A˜ := Aˆ + δA, δA := δY (X H X)−1 X H , ˆ to preserve the constraints AX = Y . From δAX = δY it is easy to see that and δY := Y − AX ˜ = AX = Y. AX Since the rank of δA is bounded by `, the approximation A˜ is in H(P, k + `) if Aˆ ∈ H(P, k). The ˆ advantage of this method is that the rank-` correction δA can be easily added to the H-matrix A. On the other hand, ` has to be small for an efficient computation. The correction approach might lead to some disadvantages if the approximation A˜ is sought in the form of an LU decomposition, for instance. If constraints are to be preserved for the H-LU decomposition of A, the update δA cannot simply be added to the approximation. Either δA has to be stored and applied to a vector separately or the factors L and U have to be updated as in [3] if ` is small. The previous approach is designed to globally preserve a few vectors for the H-matrix approximation A˜ to A. The techniques for preserving constraints presented in this article are based on a different approach. We will preserve the respective part of the constraint on each block t × s of a partition by modifying the usual truncation (5). The constraint is kept during the whole approximation process rather than explicitly correcting it after the computation. In particular, this allows to preserve constraints for high-level operations in the H-matrix arithmetic, because approximation errors occur only when truncating the sums of low-rank matrices. It will be seen that this does not only preserve the prescribed vectors X but even a whole space spanned by the restrictions of them. Furthermore, preserving local constraints during the computation directly influences the ongoing computation of the remaining blocks and may result in a smaller relative error.
4
3 Methods to blockwise preserve vectors In this section two methods are presented that truncate a given matrix A ∈ H(P, k 0 ) to a hierarchical matrix A˜ ∈ H(P, k + `), k < k 0 , i.e. ˜ V˜ H , Ats = U V H ≈ A˜ts = U
t × s ∈ P,
(6)
while preserving the constraint (2) blockwise, i.e. Ats Xs = A˜ts Xs ,
t × s ∈ P.
(7)
Afterwards, the use of these methods for the H-matrix Cholesky and LU factorization will be shown. For simplicity, the following two methods will first concentrate on Hermitian H-matrices. Afterwards it will be shown how they can be adapted to general matrices. Any block t × s in the upper triangular part P + := {t × s ∈ P : max t < min s} of the symmetric partition P also appears in the lower triangular part. Hence, (7) is equivalent to Ats Xs = A˜ts Xs
˜H and AH ts Xt = Ats Xt ,
t × s ∈ P +.
(8)
Symmetric matrix approximations can therefore be created by preserving constraints to the matrices Ats and their transpose. Notice that diagonal blocks Att are stored without approximation and hence trivially satisfy the constraints.
3.1 Preservation based on the Gram-Schmidt method We first describe how to create an approximation A˜ts of a single block (6) under the constraint (8). To this end, only the components of U V H which are perpendicular to the constraints are approximated. We define U 0 := P (Xt )U, V 0 := P (Xs )V (9) resulting from the orthogonal projection P (X) := I − X(X H X)−1 X H applied to U and V , respectively. Notice that the product U 0 V 0H can be approximated with at least the same accuracy. This can be seen from min
|t|×|s| B 0 ∈Ck
kU 0 V 0H − B 0 k2 = = ≤ ≤
min
|t|×|s| B 0 ∈Ck 0 B =P (Xt )B 0 P (Xs )
kU 0 V 0H − B 0 k2
min
kP (Xt )(U V H − B)P (Xs )k2
min
kP (Xt )k2 kP (Xs )k2 kU V H − Bk2
min
kU V H − Bk2
|t|×|s| B∈Ck
|t|×|s| B∈Ck
|t|×|s| B∈Ck
˜ 0 V˜ 0H be the approximation of U 0 V 0H due to the self-adjointness of P (X) and kP (X)k2 ≤ 1. Let U defined by (5). For later purposes we remark that ˜ 0, XtH U 0 = 0 = XtH U
XsH V 0 = 0 = XsH V˜ 0 ,
5
(10)
because the SVD preserves the null space. After the approximation we add two low-rank matrices with rank ` and obtain the approximation ˜ V˜ H defined by A˜ts := U ˜ := [Y (t), Q(Xt ), U ˜ 0 ] ∈ C|t|×(k+2`) , U V˜ := [Q(Xs ), P (Xs )Y (s), V˜ 0 ] ∈ C|s|×(k+2`)
(11)
H −1 where Y (t) := Ats Xs , Y (s) := AH ts Xt , and Q(X) := X(X X) . The following lemma shows that ˜ V˜ H ∈ C|t|×|s| actually has the desired properties. the definition of A˜ts := U k+2`
˜ 0 V˜ 0H and Lemma 3.1. Let A˜ts be constructed as above. Then Ats − A˜ts = U 0 V 0H − U kAts − A˜ts k2 ≤
min
|t|×|s| B∈Ck
kU V H − Bk2 .
Furthermore, A˜ts satisfies (8). Proof. For the first part of the assertion observe that ˜ 0 ][Q(Xs ), P (Xs )Y (s), V˜ 0 ]H Ats − A˜ts = U V H − [Y (t), Q(Xt ), U ˜ 0 V˜ 0H = U V H − Y (t)Q(Xs )H − Q(Xt )Y (s)H + Q(Xt )Y (s)H Xs Q(Xs )H − U ˜ 0 V˜ 0H . = U 0 V 0H − U The vectors are preserved, because due to (10) ˜ [Q(Xs ), P (Xs )Y (s), V˜ 0 ]H Xs = [Y (t), Q(Xt ), U ˜ 0 ][I, 0, 0]H = Y (t) A˜ts Xs = U and H H ˜0 H ˜ ˜0 H ˜0 A˜H ts Xt = V [Y (t), Q(Xt ), U ] Xt = [Q(Xs ), P (Xs )Y (s), V ][Xt Y (t), I, Xt U ]
= Q(Xs )Y (t)H Xt + Y (s) − Q(Xs )XsH Y (s) = Y (s), which follows from Y (t)H Xt = XsH Y (s). The approximation (11) can be adapted from symmetric to general matrices. Let V 0 be defined as in (9). In the non-symmetric case, it is sufficient to approximate the product U V 0H , because in contrast to symmetric problems there is no constraint for the transpose. Let U ∗ V˜ ∗H denote the approximation generated from applying the usual truncation (5) to U V 0H . In the non-symmetric case, one defines the approximation ˜ := [Y (t), U ∗ ] ∈ C|t|×(k+`) , U
V˜ := [Q(Xs ), V˜ ∗ ] ∈ C|s|×(k+`) .
(12)
As in the symmetric case, it can be easily shown that this is an approximation for Ats satisfying (7). The advantage of the approximations (11) and (12) is that they are easily implemented. However, they are no best approximations respecting the constraint (8). There could still be redundancies between the orthogonal projector and the added rank-` matrices. Another disadvantage is that this procedure has the same stability issues as the Gram-Schmidt method. A method based on Householder reflections is more desirable.
6
3.2 Preservation based on Householder reflections For the second method we again first focus on symmetric matrices and will look at more general problems later on. We take a single off-diagonal low-rank block (6) of an H-matrix and aim at approximating it with the additional conditions (8). Comparing the decomposition (4) of the rank-k 0 matrix Ats and the approximation (5), we have ˜ V˜ H + EF H , Ats = U
(13)
ˆ where the rank k-matrix EF H , kˆ := k 0 − k, is the remainder of the approximation. Dropping the latter term in (13) is responsible for the violation of the constraint; see the discussion in Sect. 2.1. By the following method only the part of EF H is dropped which is orthogonal to the constraint. Let Xt = H1 A and Xs = H2 B be QR decompositions of Xt and Xs , respectively. Herein, H1 , H2 denote Householder matrices and A ∈ C|t|×` , B ∈ C|s|×` are upper triangular matrices. Using the Householder matrices to transform the remainder of the approximation in (13) C ˆ H H1 E =: , C ∈ C`×k , (14) G1 D ˆ H2H F =: , D ∈ C`×k , (15) G2 we obtain EF
H
= H1
C G1
D G2
H
H2H
= H1
CDH G1 DH
CGH 2 H2H . G1 GH 2
(16)
The Householder matrices H1 and H2 have the effect that either C or D has a contribution to the conditions (8) when multiplying (13) by Xt or Xs , because only the first ` rows of A and B are non-zero. Motivated by this, we neglect G1 GH 2 in (16). This does not violate the constraints (8) as we shall see later on. In order to further save costs in future operations, we reduce the size of the inner |t| × |s| matrix in (16) by QR decompositions of G1 DH = Q1 R1 and G2 C H = Q2 R2 with the Householder matrices Q1 ∈ C(|t|−`)×(|t|−`) , Q2 ∈ C(|s|−`)×(|s|−`) and upper triangular matrices ˆ H , 0]H ∈ C(|t|−`)×` , R2 := [R ˆ H , 0]H ∈ C(|s|−`)×` . This leads to R1 := [R 1 2 H1
CDH
G1 DH
CGH 2 0
H2H
= H1 |
CDH I R ˆ1 Q1 0 {z }
ˆ1 =:H
ˆH R 2 0 0
0 I 0 0 |
H Q2 {z
ˆH =:H 2
H2H . }
Finally, there might be some redundancies in the inner matrix. To get rid of it, we employ the singular value decomposition of the inner 2` × 2` matrix. Using the SVD ˆH CDH R 2 ¯ S Z¯ H , =: W ˆ1 R 0 we define the blockwise approximation ¯ ¯H H ˜ V˜ + H ˆ1 W SZ A˜ts := U 0
0 ˆH I ˜, W ˆ] ˆ H, H2 = [U [V˜ , Z] 0 S
(17)
ˆ := H ˆ 1 [W ¯ H , 0]H and Zˆ := H ˆ 2 [Z¯ H , 0]H . Compared with the algorithm that uses the Gramwith W Schmidt method, the latter method has good stability properties.
7
Lemma 3.2. The approximation (17) preserves the conditions (8) and rank A˜ts ≤ k + 2`. Furtherˆ Fˆ H , where more, we have Ats − A˜ts = E 0 0 ˆ = H1 , , Fˆ = H2 E G2 G1 ˆ Fˆ H k2 ≤ kEF H k2 . and kE Proof. It is easy to verify that by construction we have ˆ Fˆ H . Ats = A˜ts + E ˆ H Xt = 0, Fˆ H Xs = 0, which follows from Hence, it suffices to show that E H H H 0 0 0 H H H ˆ Xt = E H1 Xt = H1 H1 A = A = 0, G1 G1 G1 because at most the leading ` rows of A are non-zero. Similar arguments apply for proving Fˆ H Xs = 0. Another important property is that the approximation (17) is a best approximation with respect to k · k2 under the constraints of (8). This is easy to see, because the decomposition (13) results from an SVD. The term EF H is only used to satisfy the constraints (8) and is truncated with another ¯ S Z¯ H . SVD to W The above approximation (17) can be adapted to general matrices. If only (7) needs to be fulfilled, H ˜ then in addition to G1 GH 2 we can also drop CG2 in (16). In this case, the rank of Ats is bounded by k + `.
3.3 Hierarchical matrix algebra In the previous section, we introduced two truncation algorithms which preserve on each block the corresponding restriction of vectors. In this section, the effect of preservation will be investigated for H-matrix operations. In short, it will be seen that the result of the approximate H-matrix addition, multiplication, inversion, and LU factorization also satisfies the constraint (2). Since the H-matrix addition is just a blockwise truncated addition of two low rank matrices, the rounded sum of two H-matrices can obviously be modified to satisfy (2). The next step is to define a rounded multiplication which uses this truncation. For the multiplication of two H-matrices A ∈ H(TI×J,kA ) and B ∈ H(TJ×K , kB ) assume that they are subdivided according to their block cluster trees TI×J and TJ×K : A11 A12 B11 B12 A= , B= . A21 A22 B21 B22 Then C := AB is computed recursively via A11 B11 + A12 B21 A11 B12 + A12 B22 C= , A21 B11 + A22 B21 A21 B12 + A22 B22 which involves multiplications of smaller size and rounded additions. Assuming that the subblocks of C˜ are computed such that they preserve the corresponding parts of X, i.e. Cij Xj = C˜ij Xj ,
8
i, j = 1, 2,
with X := [X1H , X2H ]H , then we even have X1 0 X1 0 ˜ C =C , 0 X2 0 X2 ˜ which in particular yields CX = CX. If the result C of the multiplication has a coarser structure, it is necessary to merge the low-rank representations Cij = Uij VijH to a single low-rank representation ˜ C = U V H . This reduces the number of constraints merely to CX = CX. Since the H-matrix LU and Cholesky factorization are computed via divide-and-conquer operations using the H-matrix addition and multiplication, the previous arguments can be applied to prove the next lemma. Lemma 3.3. If the restrictions of X corresponding to the blocks of A are preserved during truncation, ˜U ˜ it holds that then for the approximate LU decomposition A ≈ L ˜U ˜ )ts Xs = Ats Xs (L
for all t × s ∈ P.
˜U ˜ X = AX. In particular, we have L
4 Cascadic bases The preservation of blockwise constraints (7) Ats Xs = A˜ts Xs ,
t × s ∈ P,
resulting from restricting X during truncation can be utilized to obtain global constraints (2) ˜ AX = AX even for the approximate H-matrix operations. In this section, it will be seen that with a more sophisticated way of preserving vectors on each block, a significantly larger set of vectors can be preserved globally due to the hierarchical structure of P. The basis that will be presented uses the substructure of the H-matrix in a native way. The number of vectors that each block preserves depends on its level in the cluster tree. With the right choice of vectors which have to be preserved blockwise it will be shown that the whole matrix preserves a set of |L(TJ )| vectors. Given a vector u ∈ CJ , we define the linear hull of restrictions of u to the leaf clusters of s ∈ TJ Λ(s) := span{˚ usˆ : sˆ ∈ L(Ts )}, where ˚ usˆ ∈ Cs denotes the zero extension of the restriction usˆ ∈ Csˆ to Cs , i.e. ( ui , i ∈ sˆ, (˚ usˆ)i := 0, i ∈ s \ sˆ. The vector u needs to be different from zero on the restrictions to the leaves of the tree Ts , i.e. ˚ usˆ 6= 0, sˆ ∈ L(Ts ), so that {˚ usˆ : sˆ ∈ L(Ts )} is a basis for Λ(s). The following lemma generalizes the observation that given an approximation A˜11 A˜12 A11 A12 ≈ A21 A22 A˜21 A˜22
9
such that each block satisfies A˜ij Xj = Aij Xj ,
i, j = 1, 2,
then we already have
A˜11 A˜12 A˜21 A˜22
X1 0 A11 A12 X1 0 = . 0 X2 A21 A22 0 X2
We will see that to maintain property (4) the partition of H-matrices requires that larger blocks have to fulfill more constraints than leaves of smaller size. Lemma 4.1. Let t × s ∈ TI×J \ P and let A˜t0 s0 preserve Λ(s0 ) for all s0 ∈ S(s) and t0 ∈ S(t). Then A˜ts preserves Λ(s). Proof. According to the assumption, each block t0 × s0 of A˜ preserves Λ(s0 ), i.e. A˜t0 s0 ˚ usˆ0 = At0 s0 ˚ usˆ0 ,
sˆ0 ∈ L(Ts0 ).
Since (˚ usˆ)s0 = 0 for sˆ ∩ s0 = ∅, this can be extended to A˜t0 s0 (˚ usˆ)s0 = At0 s0 (˚ usˆ)s0 ,
sˆ ∈ L(Ts ),
and hence X s0 ∈S(s)
A˜t0 s0 (˚ usˆ)s0 =
X
At0 s0 (˚ usˆ)s0 ,
sˆ ∈ L(Ts ).
s0 ∈S(s)
From ˚ usˆ = (˚ usˆ)s0 for all s0 ∈ S(s) and all sˆ ∈ L(Ts ) we obtain X X A˜t0 s˚ usˆ = A˜t0 s0 (˚ usˆ)s0 = At0 s0 (˚ usˆ)s0 = At0 s˚ usˆ, s0 ∈S(s)
sˆ ∈ L(Ts ),
s0 ∈S(s)
for all t0 ∈ S(t). According to the previous lemma, it is sufficient to impose few conditions on the leaves in order to preserve a large number of vectors on a global level. The next lemma shows how many vectors are preserved by the whole matrix. Lemma 4.2. Let each block A˜ts , t × s ∈ P, preserve Λ(s). Then, A˜ preserves Λ(J). The number of linear independent vectors preserved by A˜ is |L(TJ )|. Proof. According to the assumption, each block t × s ∈ P preserves Λ(s). It follows from inductively applying Lemma 4.1 going from the leaves to the root that all blocks t × s ∈ TI×J preserve Λ(s). In particular, A˜ preserves Λ(J). It is obvious that X dim Λ(s) = dim Λ(s0 ). (18) s0 ∈S(s)
Since dim Λ(s) = 1 for all s ∈ L(TI ), the dimension of Λ(J) equals the number |L(TJ )| of leaves in TJ .
10
The number |L(TJ )| of leaves of TJ is of the order |J|. Hence, for the number of linear independent vectors preserved by A˜ it follows that dim Λ(J) ∼ |J|. Preserving O(|J|) vectors almost completely determines the whole matrix. By (18) the dimension of Λ(s) is the sum the dimensions of all Λ(s0 ) for all s0 ∈ S(s). From this it follows that larger blocks in P have to fulfill more constraints than smaller blocks. To be precise, for any leaf t × s ∈ P the depth of the subtree Ts of TJ rooted at s determines the dimension of Λ(s). Additionally, as we shall see later, the logarithmic-linear complexity of the resulting H-matrices in terms of storage will be destroyed in this case. To avoid this, the depth q of the smallest cluster s ∈ TJ for which ˚ us is preserved globally, needs to be reduced. Blocks t × s with depth greater q only preserve the restricted vector us . Here, the depth level(s) denotes the distance of s0 ∈ Ts to the root J independently of s ∈ TJ . The reduced basis for a certain cluster s ∈ {ˆ s ∈ TJ : ∃t × sˆ ∈ P} is defined in the following way: ( span{˚ usˆ : sˆ ∈ Lq (Ts )}, level(s) < q, Λq (s) := span{us }, otherwise, and Lq (Ts ) := {s0 ∈ Ts : level(s0 ) = q}. The next lemma states a similar estimate as in Lemma 4.2 for the number of linear independent vectors that are preserved globally by a reduced basis. The level of a block is defined by its distance from the root in the block cluster tree. Lemma 4.3. Let each block A˜ts , t × s ∈ P, preserve Λq (s). Then the number of linear independent vectors preserved by A˜ is at least |Lqˆ(TJ )| with qˆ := max(q, min{level(t × s) : t × s ∈ P}). Proof. If the depth of the cluster trees TI and TJ is virtually restricted to qˆ then Lemma 4.2 can be applied. Fig. 1 depicts the vectors us , ˚ usˆ, sˆ ∈ L2 (Ts ), preserved in the case of the constant vector u = 1. Off-diagonal blocks are considered to be low-rank, while diagonal blocks are refined recursively. The white blocks in Fig. 1 are treated without approximation, whereas green blocks are approximated by low-rank matrices preserving the contained vectors.
Figure 1: An H-matrix and its preserved vectors for qˆ = 2. The constant qˆ of Lemma 4.3 can be large for certain applications. This means that even though only a single vector is preserved per block, the total number of globally preserved linear independent
11
vectors might be huge due to the structure of the partition. One application of this type is nested dissection reorderings of FE systems of elliptic operators; cf. [4]. Admissible blocks only appear in the separator, which depending on the sparsity is relatively small.
Storage requirements vs. number of preserved vectors The preservation of given vectors increases the blockwise rank by the number of preserved vectors and thus the storage requirements. The number of conditions of a block t × s is given by ( |Lq (Ts )|, level(s) < q, 1, otherwise. We assume that the number of sons κ := S(s) does not depend on s ∈ TI . This is in line with the usual choice κ = 2 or κ = 3. In this case, the number of conditions for a block t × s from the `-th level of TI×J is ( κq−` , ` < q, ∗ c (`) := 1, otherwise. Hence, the storage required for t × s ∈ P is bounded by ˆ Nst (Ab ) ≤ k(|t| + |s|) with kˆ := k+c∗ . Here, k is the rank of a usual low-rank approximation defined in (5). Non-admissible blocks Ats are small, i.e. min{|s|, |t|} ≤ nmin for some nmin ∈ N. These blocks are stored entrywise and therefore require |t||s| = min{|t|, |s|} max{|t|, |s|} ≤ nmin (|t| + |s|) ˆ nmin }(|t| + |s|) units of storage are needed for Ats . With (3) units of storage. Hence, at most max{k, it follows that X
max{k, nmin }(|t| + |s|) ≤ csp max{k, nmin }[L(TI )|I| + L(TJ )|J|].
t×s∈P
The number c∗ of conditions per block depends on the level. Hence, a more sophisticated estimate is needed. First, observe that X X X X X c∗ (`)(|t| + |s|) = c∗ (`)|t| + c∗ (`)|s|. t×s∈TI×J
t∈TI {s∈TJ :t×s∈TI×J }
s∈TJ {t∈TI :t×s∈TI×J }
For the first term we obtain L(TI )−1
X
X
t∈TI {s∈TJ :t×s∈TI×J }
∗
c (`)|t| =
X
X
X
c∗ (`)|t|
{t∈TI :level(t)=`} {s∈TJ :t×s∈TI×J }
`=0
L(TI )−1
≤ csp
X
X
c∗ (`)
|t|
{t∈TI :level(t)=`}
`=0 L(TI )−1
X
= csp |I|
c∗ (`)
`=0 q
= csp |I| κ
q X
! −`
κ
+ L(TI ) − q − 1
`=0
= csp |I|
κq+1 − 1 + L(TI ) − q − 1 . κ−1
12
If q is chosen as q ∼ logκ (kL(TI )), we obtain X X
c∗ (`)|t| ∼ csp kL(TI )|I|
t∈TI {s∈TJ :t×s∈TI×J }
provided that L(TI ) ≈ L(TJ ). Hence, for this choice of q the storage required when preserving vectors is asymptotically the same as without preservation Nst (A) ∼ csp max{k, nmin }[L(TI )|I| + L(TJ )|J|]. The results of this section are summarized in the following theorem. Theorem 4.1. Assume that each block A˜ts , t × s ∈ P, preserves Λq (s) with q ∼ log(k log |I|). Then A˜ requires O(k[|I| log |I| + |J| log |J|]) units of storage and preserves O(k log |J|) linear independent vectors. Proof. The total number of vectors that are preserved by the whole matrix is |Lq (TJ )| ∼ κq ∼ kL(TI×J ) ∼ k log |J|.
5 Preservation of vectors and preconditioning One of the applications of the blockwise preservation of vectors is preconditioning. Usually, Hmatrices approximations A˜ to a given matrix A ∈ Cn×n are constructed such that ˜ 2 ≤ εkAk2 . kA − Ak
(19)
Assume that A is Hermitian. In this case, the previous condition (19) guarantees that for the i-th largest eigenvalues λi it holds that ˜ − λi (A)| ≤ ε|λ1 | |λi (A) ˜ . . . , λp (A) ˜ due to Weyl’s theorem [17]. Hence, the relative accuracy of all large eigenvalues λ1 (A), ˜ is approximately ε. In contrast to that, the relative accuracy of the of the same order as λ1 (A) ˜ where q = p + 1, . . . , n is only ε|λ1 |/|λq |. Therefore, an absolute error remaining eigenvalues λq (A), of order εkAk2 does not necessarily result in an good preconditioner A˜ unless A is well-conditioned or other additional criteria can be used. To avoid this, we could choose a smaller ε, which increases the computational complexity of the H-matrix approximation. Another idea is to preserve the eigenvectors (or at least approximations) corresponding to small eigenvalues. In the case of matrices arising from FE systems of elliptic operators, these eigenvectors are smooth so that it can be expected that they can be approximated well by a relatively small cascadic space. The following lemma justifies the idea of preserving eigenvectors corresponding to small eigenvalues. For Hermitian positive definite matrices A it is known that for the condition number it holds that λmax (A) cond(A) = kAk2 kA−1 k2 = . λmin (A) Here, λmax (A) and λmin (A) denote the eigenvalues of A of largest and smallest modulus, respectively.
13
Lemma 5.1. Let A ∈ Cn×n be Hermitian and positive definite and let A˜ ∈ Cn×n be Hermitian and satisfy (19). Denote by λ1 ≥ . . . ≥ λn > 0 the eigenvalues of A and v1 , . . . , vn ∈ Cn the associated basis of orthonormal eigenvectors. Let λi ≥ τ kAk2 ,
i = 1, . . . , p,
where 1 > τ > ε is some constant. Suppose that there exist test vectors zp+1 , . . . , zn ∈ Cn such that ˜ i and Azi = Az λi , kvi − zi k22 ≤ τ kAk2 for all i = p + 1, . . . , n. Then A˜ is also positive definite, λmax (A˜−1 A) ≤
τ , τ −ε
τ . τ +ε
λmin (A˜−1 A) ≥
In particular, the condition number of A˜−1/2 AA˜−1/2 is bounded by τ +ε λmax (A˜−1 A) . ≤ cond(A˜−1/2 AA˜−1/2 ) = −1 ˜ τ −ε λmin (A A) ˜ we note that Proof. Setting E := A − A, ˜ i = (A − A)(v ˜ i − zs ) = E(vi − zs ), i > p. Evi = (A − A)v p We now consider three different cases to estimate |viH Evj |/ λi λj . First of all, if i, j ≤ p, then we find that |viH Evj | εkAk2 ε p ≤p ≤ . τ λi λj λi λj Second, if i ≤ p but j > p, then |viH Evj | |v H E(vj − zj )| kEk2 kvj − zj k2 p p = i p ≤ √ ≤ε λi λi λj λi λj λj
r
1 ε kAk2 p = . τ τ τ kAk2
Obviously, the same relation holds if we interchange the roles of i and j. Finally, for the third case we assume that i, j > p. In this case we get |viH Evj | |(vi − zi )H E(vj − zj )| ε kvi − zi k2 kvj − zj k2 1 p p √ p = ≤ kEk2 ≤ εkAk2 = . τ kAk2 τ λi λi λj λi λj λj To summarize, we have shown that for any i, j = 1, . . . , n |viH Evj | ε p ≤ < 1. τ λi λj Using the unitary similarity transformation given by V = [v1 , . . . , vn ], we obtain ˜ + V H EV. V H AV = Λ := diag(λ1 , . . . , λn ) = V H AV 1 ˜ Λ− 12 , and it follows that It can be seen that V Λ−1 V H A˜ is similar to Λ− 2 V H AV
˜ Λ−1/2 = I − Λ−1/2 V H EV Λ−1/2 . Λ−1/2 V H AV
14
(20)
Weyl’s theorem yields ˜ ≤ 1 + kΛ−1/2 V H EV Λ−1/2 k2 λmax (A−1 A) and ˜ ≥ 1 − kΛ−1/2 V H EV Λ−1/2 k2 . λmin (A−1 A) Inequality (20) implies in particular that all eigenvalues of A−1 A˜ and ˜ −1/2 ˜ −1/2 = A−1/2 AA A1/2 (A−1 A)A are positive. By Sylvester’s law of inertia, A˜ must already be positive definite. Furthermore, from (20) we conclude that 1 τ 1 λmax (A˜−1 A) = ≤ = ε ˜ 1− τ τ −ε λmin (A−1 A) and 1 τ 1 λmin (A˜−1 A) = , ≥ ε = −1 ˜ 1 + τ + ε λmax (A A) τ which completes the proof. Lemma 5.1 states that any approximation A˜ of A can be relatively crude for the associated invariant subspace of A corresponding to large eigenvalues. For small eigenvalues λi , i = p + 1, . . . , n, the test vectors zi have to approximate the associated eigenvectors more accurately the smaller the eigenvalue. We emphasize that Lemma 5.1 does not require that zp+1 , . . . , zn have to be linear independent. I.e., if we have a set of test vectors {xs , s = 1, . . . , `} which are preserved, this holds for any linear combination of these test vectors. Thus we can individually decide for each eigenvector vi , i > p, (i) (i) which linear combination zi = α1 x1 + . . . + α` x` of test vectors is best suited to approximate the eigenvector vi . This allows in principle to choose ` n − p as long as (i) α1 .. kvi − [x1 , . . . , x` ] . k2 (i)
α` is sufficiently small.
Relaxed preservation Numerical experiments show that q can be chosen independently of the number of unknowns |I| and |J| to create a good preconditioner. Furthermore, it seems that preserving a single vector per block, which significantly improves the complexity, is enough. This seems to be in contradiction to the arguments of the previous lemma, because preserving a single vector per block does not lead to a ˜ In the rest of this section, we try sufficiently large space of vectors preserved by the whole matrix A. to give some algebraic arguments why neglecting blockwise constraints does only slightly influence the number of iterations when A˜ is used as a preconditioner. To simplify the discussion, we only consider the Hermitian positive definite case. Suppose that ˜L ˜ H ∈ Cn×n satisfies our approximation A˜ = L A = A˜ + σW W H + E, ˜ −1 E L ˜ −H k2 ≤ ε < 1 and W ∈ Cn×k corresponds to neglected blockwise constraints. where σ = ±1, kL The preconditioned system can then be read as ˜ −1 AL ˜ −H = I + σ W ˜W ˜ H + E, ˜ B := L
15
˜ 2 ≤ ε < 1 and W ˜ ∈ Cn×k . The conjugate gradient method for solving systems of type where kEk Bx = c constructs iterates x` ≈ x and the residual satisfies r` := c−Bx` = q` (B)r0 . Here r0 = c−Bx0 denotes the initial residual and q` (z) is a polynomial of degree ` normalized by q` (0) = 1; cf. [7]. Let K ∈ Cn×n be unitary such that K H BK = Λ = diag(λ1 , . . . , λn ). The conjugate gradient method is known to minimize the energy norm of the error kx − x` k2B = (x − x` )H B(x − x` ), which can be traced back to kx − x` kB ≤ kx − x0 kB · min
max |q` (λj )|.
q` (0)=1 j=1,...,n
Using the theorem of Weyl, we may assume that λ1 ≥ . . . ≥ λn and we denote by µ1 ≥ . . . ≥ µn the ˜W ˜ H . Then it follows that eigenvalues of I + σ W ˜ 2 ≤ ε < 1. |λi − µi | ≤ kEk ˜W ˜ H has µ = 1 as eigenvalue of multiplicity n − k and further k eigenvalues We remark I + σ W ˜W ˜ H are associated with the low-rank part. Depending on the sign of σ, the eigenvalues of I + σ W either σ = +1 : µk+1 = . . . = µn = 1, µi = 1 + σi2 , i = 1, . . . , k, or σ = −1 :
µ1 = . . . = µn−k = 1,
2 µi = 1 − σn−i+1 , i = n − k + 1, . . . , n,
˜ in decreasing order. If σ = +1, then we where σ1 ≥ . . . ≥ σk > 0 denote the singular values of W obtain from Weyl’s theorem that λk+1 , . . . , λn ∈ [1 − ε, 1 + ε],
λ1 , . . . , λk ∈ [1 + σk2 − ε, 1 + σ12 + ε].
Similarly, for σ = −1 we obtain λ1 , . . . , λn−k ∈ [1 − ε, 1 + ε],
λn−k+1 , . . . , λn ∈ [1 − σ12 − ε, 1 − σk2 + ε].
˜ is well separated Without loss of generality we may assume that the smallest singular value of W 2 from ε, i.e., we assume that 2ε σk . Otherwise we could have taken the associated rank-one matrix ˜W ˜ H away and add it to E. ˜ In this case, part of the low-rank matrix would still be vk σk vkH from W ˜ ˜W ˜ H are of interest considered as perturbation part E and certainly only those low-rank matrices W that are large enough to be distinguished from the noise part. Notay [15] discusses convergence bounds for CG in the presence of rounding errors, where perturbed isolated eigenvalues are located at the lower or the upper end of the spectrum. Here, the perturbed eigenvalue is µ = 1. Following these ideas, we can bound |q` (λj )| by min
max |q` (λj )| ≤
q` (0)=1 j=1,...,n
min
max |q`−1 (λj )||1 − λj |.
q`−1 (0)=1 j=1,...,n
As long as we assume that µ = 1 is an isolated eigenvalue at the lower end of the spectrum or if the rank k is small, the term |1 − λj | will be good enough for establishing bounds. Otherwise, when k becomes larger and µ = 1 refers to a perturbed isolated eigenvalue at the upper end of the spectrum, a different approach using a Chebyshev polynomial of appropriate degree s should be chosen (cf. [15]). Here we will assume that k is not too large so that we will keep using |1 − λj |.
16
For the case σ = +1 we can easily estimate |q` (λj )| using Chebyshev polynomials q`−1 (λ) =
c`−1 (λ) c`−1 (0)
with respect to the interval [1 + σk2 − ε, 1 + σ12 + ε]. With arguments analogous to the standard conjugate gradients methods [7] we obtain 2 min max |q` (λj )| ≤ min max ε max |q`−1 (λj )|, σ1 max |q`−1 (λj )| j>k j≤k q` (0)=1 j=1,...,n q`−1 (0)=1 |c`−1 (x)| |c`−1 (x)| ≤ max ε max , σ12 max x∈[1−ε,1+ε] |c`−1 (0)| x∈[1+σk2 −ε,1+σ12 +ε] |c`−1 (0)| | {z } ≤1
√ `−1 κ−1 2 / 2σ1 √ , κ+1 σ2
1+σ 2 +ε
σ2
where κ = 1+σ12 −ε . Here we used that ε 2k ≤ 21 . The convergence thus primarily depends on the k low-rank part and the isolated cluster at µ = 1 only causes a minor delay at this point. Even the low-rank part only requires a moderate number of iteration steps when k is small. As soon as the number of steps approaches k, maxj≤k |q` (λj )| becomes negligible since this part becomes zero for q` (λ) = q`−k (λ)
k Y |λ − λj | . |λj |
(21)
j=1
At this point convergence can viewed as turning back minimizing min max q` (λj ) ≤
q` (0)=1 j>k
which can be bounded by
|c`−k (x)| , x∈[1−ε,1+ε] |c`−k (0)| max
√ κε − 1 `−k 2 √ κε + 1
1+ε in terms of remaining condition number κε = 1−ε . But by construction of the hierarchical matrix approximation κε is expected to be small, i.e., only a moderate number of additional iteration steps is required. Similar arguments applied to the case when σ = −1 based on Chebyshev polynomials with respect to [1 − σ12 − ε, 1 − σk2 + ε] yield √ `−1 ! |c`−1 (x)| κ−1 min max |q` (λj )| ≤ max ε max , 2 √ , κ+1 q` (0)=1 j=1,...,n x∈[1−ε,1+ε] |c`−1 (0)| 1−σ 2 +ε
where κ = 1−σk2 −ε . In this case for B to be positive definite, σ1 ≤ 1 is already necessary. Unlike the 1 case σ = +1, for σ = −1 we have !`−1 |c`−1 (x)| 1+ε max ∼ σ 2 +σ 2 x∈[1−ε,1+ε] |c`−1 (0)| 1− 1 k 2
17
which may be captured by the additional factor ε as long as k is small. So again the initial, up to k steps can be read as being dominated by the low-rank part and its condition number. Similar to (21), for ` ≥ k using Y |λ − λj | q` (λ) = q`−k (λ) |λj | j>n−k
convergence can be described by min max q` (λj ) ≤
q` (0)=1 j≤n−k
Y j>n−k
Y σj2 + 2ε √κε − 1 `−k |c`−k (x)| · max · √ . ≤2 κε + 1 1 − σj2 − ε x∈[1−ε,1+ε] |c`−k (0)| 1 − σj2 − ε σj2 + 2ε
j>n−k
Since in this case 0 < σj < 1 and k is assumed to be small, we expect the influence of the product Q σj2 +2ε j>n−k 1−σ 2 −ε to be mild, so that the convergence behaviour is analogous to the case when σ = +1. j
In practice using an H-matrix approximation we preserve less vectors when only preserving Λq (s) compared with the full preservation of Λ(s). In the symmetric case any low-rank part that is neglected in the lower triangular has a counter part in the upper triangular part. I.e., the skipped low-rank corrections are of the form W1 W2H + W2 W1H and can obviously be written as a difference of two Hermitian semi-definite low-rank matrices W1 W2H + W2 W1H =
1 1 (W1 + W2 ) (W1 + W2 )H − (W1 − W2 ) (W1 − W2 )H . 2 2
So a more realistic representation of A would be A = A˜ + W1 W2H + W2 W1H + E. This in turn explains why both cases, σ = +1 and σ = −1 were discussed. In summary, we expect to have not significantly more than k additional iteration steps. This justifies that it is advantageous to neglect some constraints in particular for large blocks.
6 Numerical results The following two numerical examples show the effect of blockwise preserving given vectors when preconditioning discretizations of Poisson problems. In these examples we compare a usual H-matrix preconditioner, which is solely based on (1), and an H-matrix preconditioner additionally preserving a single constant vector on each block. To this end, approximate H-Cholesky decompositions with accuracy ε = 1e − 1 in the first and ε = 5.6e − 3 in the second test are computed. All linear systems were solved using the conjugate gradient method up to an accuracy of 1e − 10. Finally, a third test was performed with varying approximation accuracy but equal number of CG steps to compare preconditioners of the same quality. The minimal block size nmin is set to 100 for all tests. All tests were performed on a single core of an Intel Xeon X5482 processor at 3.2 GHz with 64 GB of core memory using the H-matrix library AHMED1 . In the first example, we consider the operator L = −div C∇ with coefficients of the form 1 0 C(x) = . 0 α(x) The two-dimensional computational domain Ω is shown in Fig. 2, where in the green part α is set 1
see the web site http://bebendorf.ins.uni-bonn.de/AHMED.html
18
Figure 2: The two-dimensional domain Ω showing the areas of different coefficients. to 1 and in the red part α = 10. For the discretization a uniform mesh and linear ansatz functions were used. Tab. 1 shows the overall runtime savings of the Cholesky factorization combined with the CG method on a logarithmic scale. The efficiency of the new approach increases with an increasing number of unknowns. As seen from Fig. 3, the number of CG steps remains almost constant for the new preconditioner in contrast to the usual approach.
size 0.5e5 1e5 2e5 4e5 8e5 16e5 32e5
new preconditioner time Chol. CGtime (steps) 2.3 s 1.1 s (15) 7.5 s 2.7 s (16) 17.9 s 5.1 s (14) 53.7 s 13.2 s (16) 121.4 s 21.3 s (12) 356.2 s 55.9 s (14) 781.7 s 152.6 s (18)
old preconditioner time Chol. CGtime (steps) 1.0 s 2.0 s (33) 2.7 s 5.8 s (44) 5.5 s 16.5 s (61) 14.6 s 46.7 s (81) 29.2 s 135.4 s (115) 77.9 s 392.3 s (158) 152.6 s 1088.7 s (217)
savings total time -0.5 s (-14%) -1.8 s (-18%) -1.0 s (-5%) -5.6 s (-8%) 21.9 s (15%) 58.1 s (14%) 306.9 s (33%)
Table 1: Time needed for the Cholesky decomposition and CG iteration of the first example. In the second example, a problem arising from a mixed boundary value problem of the electrostatic Maxwell equations for the computational domain in R3 (see Fig. 4) is chosen. The boundary surface ∂Ω1 is the upper ends of the conductors and ∂Ω2 the lower. The remaining boundary is ∂Ω3 := ∂Ω\(∂Ω1 ∪ ∂Ω2 ). The problem is described by −div(σ∇u) = 0
in Ω,
u = 0 on ∂Ω1 , ∂u = I on ∂Ω2 , ∂ν ∂u = 0 on ∂Ω3 . ∂ν The conductivity of the two conductors is σ = 1e − 7 and σ = 1e − 4, respectively. For the discretization quadratic ansatz functions have been chosen, which leads to larger blockwise ranks compared with linear ansatz functions. Hence, the additional rank resulting from preserved vectors hardly play any role for the memory consumption. In Table 2 and Fig. 5 it can be seen that the new approach leads to a significant decrease in the number of CG steps. Hence, savings in terms of runtime can be achieved for larger problem sizes.
19
250
c r
old preconditioner new preconditioner
c
CG steps
200 c
150 c
100 50 0
c c r
c r
105
c r
r
r
r
r
106 number of unknowns
Figure 3: Number of CG steps for the first example on a log-scale.
size 195049 761936 1339997 4128445 5597421
new preconditioner time Chol. CGtime (steps) 35.2 s 8.4 s (22) 215.1 s 56.3 s (31) 406.0 s 110.6 s (33) 1461.0 s 422.8 s (39) 2279.6 s 550.7 s (33)
old preconditioner time Chol. CGtime (steps) 23.2 s 18.4 s (50) 135.1 s 127.1 s (73) 244.1 s 262.6 s (82) 849.3 s 1420.0 s (138) 1169.7 s 2697.4 s (172)
savings total time -2 s (-5%) -9 s (-3%) -10 s (-2%) 385 s (20%) 1037 s (27%)
Table 2: Time needed for the Cholesky decomposition and CG iteration of the second example. Furthermore in the third test preconditioner of the same quality were compared. Preconditioners of the same quality here means that the approximation accuracy was chosen such that both preconditioners achieve the same number of CG steps. It can be seen from Table 3 that in this case there is even an improvement in terms of memory. This is remarkable as the usual H-matrix approximation already obtains a blockwise best approximation for the spectral norm. The numerical experiments show that the preservation of extra vectors can be beneficial in terms of runtime and memory. The accuracy of the approximation has to be adapted hardly with increasing number of unknowns as opposed to the usual H-matrix approach. Hence, this new method is especially favorable for large problems, i.e. more than a million of unknowns.
References [1] J. Barnes and P. Hut. A hierarchical O(n ln n) force calculation algorithm. Nature, 324:446–449, 1986. [2] M. Bebendorf. Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems, volume 63 of Lecture Notes in Computational Science and Engineering (LNCSE). Springer, 2008. ISBN 978-3-540-77146-3.
20
Figure 4: The two conductors of the second example. size 195049 761936 1339997 4128445 5597421
CG steps 26 26 33 35 33
size new (acc.) 402 MB (0.0161) 1889 MB (0.0150) 3272 MB (0.0098) 10570 MB (0.0058) 15478 MB (0.0065)
size old (acc.) 467 MB (0.00094) 2210 MB (0.00094) 3870 MB (0.00063) 13083 MB (0.00025) 20612 MB (0.00011)
savings 56 MB (14%) 321 MB (15%) 598 MB (16%) 2513 MB (19%) 5134 MB (25%)
Table 3: Memory needed for the preconditioner of second example with fixed number of CG steps. [3] M. Bebendorf and Y. Chen. Efficient solution of nonlinear elliptic problems using hierarchical matrices with broyden updates. Computing, 81:239–257, 2007. [4] M. Bebendorf and T. Fischer. On the purely algebraic data-sparse approximation of the inverse and the triangular factors of sparse matrices. Num. Lin. Alg. Appl., 18:105–122, 2011. [5] K. Giebermann. Multilevel approximation of boundary integral operators. Computing, 67:183– 207, 2001. [6] L. Grasedyck and W. Hackbusch. Construction and arithmetics of H-matrices. Computing, 70:295–334, 2003. [7] A. Greenbaum. Iterative methods for solving linear systems. Number 17 in Frontiers in Applied Mathematics. SIAM, Philadelphia, PA, 1997. [8] L. F. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J. Comput. Phys., 73(2):325–348, 1987. [9] L. F. Greengard and V. Rokhlin. A new version of the fast multipole method for the Laplace equation in three dimensions. In Acta numerica, 1997, volume 6 of Acta Numer., pages 229–269. Cambridge Univ. Press, Cambridge, 1997. [10] W. Hackbusch. A sparse matrix arithmetic based on H-matrices. Part I: Introduction to Hmatrices. Computing, 62(2):89–108, 1999. [11] W. Hackbusch. Hierarchische Matrizen. Springer Verlag, 2009.
21
180
old preconditioner new preconditioner 160
c
c r c
CG steps
140 120 100 80 60
c c
40 20 105
c
r
r
r
r
r
106 number of unknowns
Figure 5: Number of CG steps for the second example on a log-scale.
[12] W. Hackbusch and B. N. Khoromskij. A sparse H-matrix arithmetic. Part II: Application to multi-dimensional problems. Computing, 64(1):21–47, 2000. [13] W. Hackbusch and Z. P. Nowak. On the fast matrix multiplication in the boundary element method by panel clustering. Numer. Math., 54(4):463–491, 1989. [14] L. Mirsky. Symmetric gauge functions and unitarily invariant norms. Quart. J. Math. Oxford Ser. (2), 11:50–59, 1960. [15] Yvan Notay. On the convergence rate of the conjugate gradient methods in presence of rounding errors. Numerische Mathematik, 65:301–317, 1993. [16] E. E. Tyrtyshnikov. Mosaic-skeleton approximations. Calcolo, 33(1-2):47–57 (1998), 1996. Toeplitz matrices: structures, algorithms and applications (Cortona, 1996). [17] James H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, 1988.
22
Bestellungen nimmt entgegen: Sonderforschungsbereich 611 der Universität Bonn Endenicher Allee 60 D - 53115 Bonn Telefon: Telefax: E-Mail:
0228/73 4882 0228/73 7864
[email protected]
http://www.sfb611.iam.uni-bonn.de/
Verzeichnis der erschienenen Preprints ab No. 475
475. Frehse, Jens; Löbach, Dominique: Improved Lp-Estimates for the Strain Velocities in Hardening Problems 476. Kurzke, Matthias; Melcher, Christof; Moser, Roger: Vortex Motion for the LandauLifshitz-Gilbert Equation with Spin Transfer Torque 477. Arguin, Louis-Pierre; Bovier, Anton; Kistler, Nicola: The Genealogy of Extremal Particles of Branching Brownian Motion 478. Bovier, Anton; Gayrard, Véronique: Convergence of Clock Processes in Random Environments and Ageing in the p-Spin SK Model 479. Bartels, Sören; Müller, Rüdiger: Error Control for the Approximation of Allen-Cahn and Cahn-Hilliard Equations with a Logarithmic Potential 480. Albeverio, Sergio; Kusuoka, Seiichiro: Diffusion Processes in Thin Tubes and their Limits on Graphs 481. Arguin, Louis-Pierre; Bovier, Anton; Kistler, Nicola: Poissonian Statistics in the Extremal Process of Branching Brownian Motion 482. Albeverio, Sergio; Pratsiovyta, Iryna; Torbin, Grygoriy: On the Probabilistic, Metric and Dimensional Theories of the Second Ostrogradsky Expansion α
483. Bulíček, Miroslav; Frehse, Jens: C -Regularity for a Class of Non-Diagonal Elliptic Systems with p-Growth 484. Ferrari, Partik L.: From Interacting Particle Systems to Random Matrices 485. Ferrari, Partik L.; Frings, René: On the Partial Connection Between Random Matrices and Interacting Particle Systems 486. Scardia, Lucia; Zeppieri, Caterina Ida: Line-Tension Model as the Γ–Limit of a Nonlinear Dislocation Energy 487. Bolthausen, Erwin; Kistler, Nicola: A Quenched Large Deviation Principle and a Parisi Formula for a Perceptron Version of the Grem 488. Griebel, Michael; Harbrecht, Helmut: Approximation of Two-Variate Functions: Singular Value Decomposition Versus Regular Sparse Grids 489. Bartels, Sören; Kruzik, Martin: An Efficient Approach of the Numerical Solution of
Rate-independent Problems with Nonconvex Energies 490. Bartels, Sören; Mielke, Alexander; Roubicek, Tomas: Quasistatic Small-strain Plasticity in the Limit of Vanishing Hardening and its Numerical Approximation 491. Bebendorf, Mario; Venn, Raoul: Constructing Nested Bases Approximations from the Entries of Non-local Operators 492. Arguin, Louis-Pierre; Bovier, Anton; Kistler, Nicola: The Extremal Process of Branching Brownian Motion 493. Adler, Mark; Ferrari, Patrik L.; van Moerbeke, Pierre: Non-intersecting Random Walks in the Neighborhood of a Symmetric Tacnode 494. Bebendorf, Mario; Bollhöfer, Matthias; Bratsch, Michael: Hierarchical Matrix Approximation with Blockwise Constrains