MODIFIED TRUNCATED RANDOMIZED SINGULAR VALUE DECOMPOSITION (MTRSVD) ALGORITHMS FOR LARGE SCALE DISCRETE ILL-POSED PROBLEMS WITH GENERAL-FORM REGULARIZATION∗
arXiv:1708.01722v1 [math.NA] 5 Aug 2017
ZHONGXIAO JIA† AND YANFEI YANG‡ Abstract. In this paper, we propose new randomization based algorithms for large scale discrete ill-posed problems with general-form regularization: minkLxk subject to minkAx − bk, where L is a regularization matrix. Our algorithms are inspired by the modified truncated singular value decomposition (MTSVD) method which suits only for small to medium sized problems, and randomized SVD algorithms that generate low rank approximations to A. We use rank-k truncated randomized SVD (TRSVD) approximations to A by truncating the rank-(k + q) randomized SVD (RSVD) approximations to A other than the best rank-k approximation to A, where q is an oversampling parameter. The resulting algorithms are called modified TRSVD (MTRSVD) methods. At every step, we use the LSQR algorithm to solve the resulting inner least squares problem, which is proved to become better conditioned as k increases. We present sharp bounds for the approximation accuracy of the RSVDs and TRSVDs for severely, moderately and mildly ill-posed problems, and substantially improve a known basic bound for TRSVD approximations. Numerical experiments show that the best regularized solutions obtained by MTRSVD are as accurate as the best regularized solutions obtained by the truncated generalized singular value decomposition (TGSVD) method, and they are at least as accurate as and can be much more accurate than those by some existing truncated randomized generalized singular value decomposition (TRGSVD) algorithms. Key words. MTRSVD, RSVD, TRSVD, TGSVD, discrete ill-posed, general-form Tikhonov regularization, Lanczos bidiagonalization, LSQR AMS subject classifications. 65F22, 65F10, 65J20, 15A18, 65F35
1. Introduction. Consider the solution of the large-scale linear discrete ill-posed problem (1.1)
min kAx − bk or
x∈Rn
Ax = b,
A ∈ Rm×n ,
b ∈ Rm ,
where the norm k·k is the 2-norm of a vector or matrix, the matrix A is ill conditioned with its singular values decaying to zero with no obvious gap between consecutive ones, and the right-hand side b = ˆb + e is noisy and assumed to be contaminated by a white noise e which may stem from measurement, truncation or discretization errors, where ˆb represents the unknown noise-free right-hand side and kek < kˆbk. Such kind of problem arises in a variety of applications such as computerized tomography, electrocardiography, image deblurring, signal processing, geophysics, heat propagation, biomedical and optical imaging, groundwater modeling, and many others; see, e.g., [1, 3, 5, 6, 21, 24, 28]. The naive solution xnaive = A† b is a meaningless approximation to the true solution xtrue = A†ˆb since b is contaminated by the noise and A is extremely ill conditioned, where † denotes the Moore-Penrose inverse of a matrix. Therefore, one has to use regularization to obtain a best possible approximation to xtrue [12, 14]. ∗ This
work was supported in part by the National Science Foundation of China (No. 11371219) of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. (
[email protected]) ‡ Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. (
[email protected]) † Department
1
2
Z. JIA AND Y. YANG
One of the common regularization approaches is to solve (1.2)
min kxk
subject to
kAx − bk = min .
The truncated singular value decomposition (TSVD) method is one of the most popular regularization methods for solving (1.2). The method computes a minimum 2-norm least squares solution, i.e., the TSVD solution xk , which solves the problem (1.3)
min kxk
subject to
x ∈ Sk = {x | kAk x − bk = min}
starting with k = 1 onwards until a best regularized solution is found at some k, where Ak is a best rank-k approximation to A with respect to the 2-norm and the index k plays the role of the regularization parameter. It is known from, e.g., [8, p. 79], that kA − Ak k = σk+1 ,
(1.4)
where σk+1 is the (k + 1)th large singular value of A. The TSVD method is equivalent to the standard-form Tikhonov regularization (1.5)
min kAx − bk2 + λ2 kxk2
x∈Rn
with the regularization parameter λ > 0 in the sense that for any regularization parameter λ ∈ [σn , σ1 ] there is a truncation parameter k such that the solutions computed by the two methods are close; furthermore, with the optimal parameter λopt chosen, the best regularized solutions obtained by the two methods have very comparable accuracy with essentially the minimum 2-norm error [12, 14]. Hansen [12] points out that, in many applications, minimizing the 2-norm of the solution, i.e., min kxk = min kIn xk with In being the n × n identity matrix, is not an optimal choice. On the one hand, kxk may not always be affected as much by the errors as the 2-norm of a derivative of the solution. On the other hand, the SVD basis vectors may not be well suited for computing a good regularized solution to (1.1), but choosing a regularization matrix L 6= In can often lead to a much better approximate solution. He presents some examples such as data approximation by bivariate spline [4]. Kilmer et al. [22] also give some examples from geophysics and heat distribution, where choosing an L 6= In appears more effective. In this paper, we consider to exploit the priori information on xtrue by using min kLxk in (1.2) and (1.3) other than min kxk, that is, we solve the general-form regularization problem (1.6)
min kLxk subject to kAx − bk = min,
where L ∈ Rp×n is usually a discrete approximation of some derivative operators. When L = 6 In , (1.5) becomes the general-form Tikhonov regularization (1.7)
min kAx − bk2 + λ2 kLxk2 .
x∈Rn
The solution to (1.7) is unique for a given λ > 0 when N (A) ∩ N (L) = 0 ⇐⇒ rank
A L
= n,
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS
3
where N (·) denotes the null space of a matrix. In practical applications, L is typically chosen as 1 −1 1 −1 (1.8) L1 = ∈ R(n−1)×n , .. .. . . 1 −1
(1.9)
or (1.10)
L2 =
−1
2 −1 −1 2 −1 .. .. . . −1
L3 =
L1 L2
..
. 2
−1
∈ R(n−2)×n ,
∈ R(2n−3)×n ,
where L1 and L2 are scaled discrete approximations of the first and second derivative operators of the solution xtrue , respectively. For small to moderate sized problems, adapting the TSVD method (1.3) to (1.6), Hansen et al. [16] propose an algorithm, called the modified truncated SVD (MTSVD) method, that solves (1.11)
min kLxk subject to
x ∈ Sk = {x | kAk x − bk = min}
for regularized solutions to (1.6) starting with k = 1 onwards until a best regularized solution is found for some k. As in the TSVD method, k plays the role of the regularization parameter. This approach is an alternative to the TGSVD method. The algorithm first computes the SVD of A and then extracts the best rank-k approximation Ak to A by truncating the SVD of A. It solves a sequence of least squares problems by the adaptive QR factorization from k = 1 onwards until a best regularized solution is found. This algorithm avoids computing the GSVD of the matrix pair {A, L}. But computing the SVD of A is infeasible for A large, because of the computational cost and memory restriction. For L = In , Xiang and Zou [32] adapt some basic randomized algorithms from [11] to (1.1) and develop a randomized SVD (RSVD) algorithm. RSVD acts A on a Gaussian random matrix to capture the dominant information on the range of A, and computes the SVD of a small matrix. By the SVD of the small matrix, one then obtains an approximate SVD of A. Halko et al. [11] have given an accuracy analysis on the randomized algorithm and approximate SVD and established a number of error bounds for them. Randomized algorithms have been receiving high attention in recent years and widely used in a variety of low rank approximations; see, e.g., [9, 10, 11, 23, 26, 27, 30, 31, 32, 33]. For L 6= In , Xiang and Zou [33] present a randomized GSVD (RGSVD) algorithm to solve (1.1). They focus only on the underdetermined case. First, they compute a RSVD of A. Then they compute the GSVD of the matrix pair {AQ, LQ}, where Q is the right singular vectors of the RSVD. The matrix Q captures the information on dominant right singular vectors of A, which ensures that AQQT is a good low rank approximation to A with high probability; see [11] and next section for details.
4
Z. JIA AND Y. YANG
However, the generation of Q does not make use of any information on L, and the algorithm simply replaces L by LQQT . As a consequence, there is no theoretical support that LQQT is also a good low rank approximation to L, which further means that there is no guarantee that the matrix pair {AQQT , LQQT } is a good low rank approximation to {A, L}, the critical requirement that RGSVD succeeds to solve (1.1). Wei et al. [31] propose new RGSVD algorithms. They treat both underdetermined and overdetermined cases. For the underdetermined case, their algorithm is the same as Xiang and Zou [33] in theory. An algorithmic difference is that they do not compute an approximate SVD of A. However, they also need to compute the GSVD of the matrix pair {AQ, LQ}, where Q capture only the information on dominant right singular vectors of A and has nothing to do with L. Therefore, it has the same deficiency as the algorithm in [33]. For the overdetermined case, their RGSVD method needs to compute the GSVD of the matrix pair {B, L}, where B = QT A ∈ Rl×n is a dense matrix with Q being an m × l orthonormal matrix generated by randomized algorithms, L ∈ Rp×n and the parameter l satisfies l + p ≥ n. Since this algorithm captures the dominant information on A and retains L itself, it generates a good low rank approximation {QQT A, L} to {A, L}. Therefore, intuitively the algorithm works theoretically for (1.1). However, for a large scale (1.1), n must be large, so is the size of the matrix pair {B, L}. This leads to the computation and storage memory of the GSVD of the matrix pair {B, L} impractical because one must compute the large dense n × n right singular vector matrix of this matrix pair and invert it when using the resulting randomized GSVD to solve (1.6) or (1.7) As a result, such RGSVD algorithm actually does not suit for large scale problems. In this paper, inspired by the idea of randomized algorithms and the MTSVD method, we will propose a modified truncated randomized SVD (MTRSVD) method for solving (1.6). Our method consists of four steps: first, use the RSVD algorithms [11] to obtain approximate SVDs of A for the underdetermined and overdetermined cases, respectively; second, truncate the approximate SVDs to obtain rank-k TRSVD ek to A; third, use A ek to replace the best rank-k approximation Ak approximations A in (1.11); finally, solve (1.12)
min kLxk subject to
ek x − bk = min} x ∈ Sk = {x | kA
starting with k = 1. This step will give rise to a large least squares problem that is different from the one in [16] and cannot be solved by adaptive QR factorizations any more because of its large size and the denseness of the coefficient matrix. We use the LSQR algorithm [29] to solve the resulting least square problem until a best regularized solution is found at some k = k0 . We consider a number of theoretical issues on the MTRSVD algorithms. For severely, moderately and mildly ill-posed problems [12, 14, 17], we establish some sharp error bounds for the approximation error kA − QQT Ak (or kA − AQQT k) in terms of σk+1 , where QQT A (or AQQT ) is the rank-(k + q) RSVD approximation and Q ∈ Rm×(k+q) (or Q ∈ Rn×(k+q) ) is an orthonormal matrix with q being an oversampling parameter. Halko et al. [11] have presented a number of error bounds for the approximation errors. Their bounds have been used in, e.g., [31, 33] and are good enough for a nearly rank deficient A, but turn out to be possibly meaningless for ill-posed problems since they are pessimistic and even may never become small for any k and q. In contrast, our bounds are much sharper. Next, for the rank-k ek , we focus on a basic bound in [11] and improve it substantially. approximations A Our new bounds are unconditionally superior to the bound in [11] and can be much
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS
5
sharper than the latter, and they explain why the error introduced in truncation step is not so damaging, an important concern in [11, Remark 9.1]. For the MTRSVD algorithms, we analyze the conditioning of the resulting inner least squares problem at each step k. We will prove that the condition number monotonically decreases as k increases, such that the LSQR algorithm for solving it generally converges faster and uses fewer inner iterations as k increases for the same stopping tolerance. In the meantime, we consider efficient implementations of Lanczos bidiagonalization used within LSQR for the inner least squares problems, and discuss some issues on the determination of the optimal regularization parameter k0 in MTRSVD and some other randomizied algorithms. Finally, we report numerical experiments. We show that, for the m ≥ n case with n not large, the best regularized solutions obtained by MTRSVD are as accurate as those by the TGSVD algorithm and the TRGSVD algorithm in [31]. When n is large, the TRGSVD algorithm in [31] is out of memory in our computer, but our algorithm works well. For the m ≤ n case, the best regularized solutions by MTRSVD are very comparable to those by the TGSVD algorithm but can be much more accurate than those by the TRGSVD algorithms in [31, 33] for some moderately and mildly ill-posed problems. Our paper is organized as follows. In Section 2, we review the RSVD algorithms and establish new error bounds for the RSVD approximations to A for severely, moderately and mildly ill-posed problems, respectively. In Section 3, we present the MTRSVD algorithms, establish new sharp bounds for the TRSVD approximation to A, and make an analysis on the conditioning of inner least squares problems. In Section 4, we report numerical examples to illustrate that our algorithms work well. Finally, we conclude the paper in Section 5. 2. RSVD and sharp error bounds. Let the compact SVD of A ∈ Rm×n be (2.1)
A = U ΣV T ,
where U = (u1 , u2 , . . . , us ) ∈ Rm×s and V = (v1 , v2 , . . . , vs ) ∈ Rn×s are column orthonormal, Σ = diag(σ1 , σ2 , . . . , σs ) ∈ Rs×s with s = min{m, n} and σ1 , σ2 , . . . , σs being the singular values and labeled as σ1 ≥ σ2 ≥ · · · ≥ σs > 0. Then (2.2)
Ak = Uk Σk VkT
is one of the best rank-k approximations to A with respect to the 2-norm, where Uk = (u1 , u2 , . . . , uk ) ∈ Rm×k and Vk = (v1 , v2 , . . . , vk ) ∈ Rn×k are column orthonormal, and Σk = diag(σ1 , σ2 , . . . , σk ) ∈ Rk×k . Define the condition number of A as κ(A) =
σ1 σmax (A) . = σmin (A) σs
The following Algorithm 1 is the basic randomized algorithm, presented in [11], to compute a low rank approximation to A and an approximate SVD of A for the overdetermined case (m ≥ n). The mechanism of Algorithm 1 is as follows: the information of the column space of A is extracted in step 2 , i.e., R(Y ) ⊆ R(A) where R(·) denotes the column space or range of a matrix. It is clear that the columns of Q span the main range of A in step 3 and R(Q) = R(Y ) ⊆ R(A). Noting R(A) = R(U ), the factor Q captures the dominant left singular vectors of A. In step 4, because of R(B T ) ⊆ R(AT ) = R(V ), the matrix B provides information on the dominant right singular vectors of A. In step 6, the algorithm modifies the approximate left singular vectors.
6
Z. JIA AND Y. YANG
Algorithm 1 (RSVD) Given A ∈ Rm×n (m ≥ n), l = k + q < n and q ≥ 4, compute eΣ e Ve T with U e ∈ Rm×l , Σ e ∈ Rl×l and Ve ∈ Rn×l . an approximate SVD: A ≈ U 1: Generate an n × l Gaussian random matrix Ω. 2: Form the m × l matrix Y = AΩ. 3: Compute the m × l orthonormal matrix Q via QR factorization Y = QR. 4: Form the l × n matrix B = QT A. e Ve T . 5: Compute the compact SVD of the small matrix B: B = W Σ T T e e e e 6: Form the m × l matrix U = QW , and A ≈ U ΣV = QQ A.
For the underdetermined case (m ≤ n), Halko et al. [11] present Algorithm 2, which is equivalent to applying Algorithm 1 to AT . Algorithm 2 (RSVD) Given A ∈ Rm×n (m ≤ n), l = k + q < m and q ≥ 4, compute eΣ e Ve T with U e ∈ Rm×l , Σ e ∈ Rl×l and Ve ∈ Rn×l . an approximate SVD: A ≈ U 1: Generate an l × m Gaussian random matrix Ω. 2: Form the l × n matrix Y = ΩA. 3: Compute the n × l orthonormal matrix Q via QR factorization Y T = QR. 4: Form the m × l matrix B = AQ. e ΣW e T. 5: Compute the compact SVD of the small matrix B: B = U e = QW , and A ≈ U eΣ e Ve T = AQQT . 6: Form the n × l matrix V When q ≥ 4, Halko et al. [11] establish the following basic estimate on the approximation accuracy of QQT A generated by Algorithm 1: 1/2 X p p (2.3) σj2 kA − QQT Ak ≤ 1 + 6 (k + q)qlogq σk+1 + 3 k + q j>k
with failure probability at most 3q −q . Based on (2.3), Halko et al. [11] derive a simplified elegant error bound p (2.4) kA − QQT Ak ≤ 1 + 9 (k + q)(n − k) σk+1
with failure probability at most 3q −q . As we can see clearly, for a fixed k the above two bounds monotonically increases with q, which is not in accordance with a basic result that, for a fixed k, the left hand side of (2.4) monotonically decreases with q; see Proposition 8.5 of [11]. Xiang and Zou [32] and Wei et al. [31] directly exploit the bound (2.4) in their analysis. For a nearly rank deficient A, the monotonic increasing property of the right-hand sides of (2.3) and (2.4) with q do not have serious harm since it can be small enough to detect the numerical rank k whenever q is not large, the singular values σk ≫ σk+1 and σk+1 is numerically small. In the context of illposed problems, however, the situation is completely different since (2.4) may be too pessimistic and meaningless, as will be clear soon. To this end, we notice another basic bound from [11] that has received little attention but appears more insightful and useful than (2.3), at least in the context of ill-posed problems: 12 s ! √ X k 8 k+q (2.5) kA − QQT Ak ≤ 1 + 16 1 + σk+1 + σj2 . q+1 q+1 j>k
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS
7
with failure probability at most 3e−q . In the manner of deriving (2.4) from (2.3), we have a simplified form of (2.5):
(2.6)
T
kA − QQ Ak ≤
s
! p 8 (k + q)(n − k) k σk+1 . + 1 + 16 1 + q+1 q+1
On contrary to (2.3) and (2.4), an advantage of the bounds (2.5) and (2.6) is that they monotonically decrease with q for a given k. Compared with (2.3), a minor theoretical disadvantage of (2.5) is that its failure probability 3e−q is a little higher than 3q −q of (2.3) for q ≥ 4. But this should not cause any essential problem for practical purposes. For the bound √ (2.6), it is easily justified that the factor in front√ of σk+1 lies between 1 + O( n) and 1 + O(n) for a small fixed q, and it is 1 + O( n) for k not big when dynamically choosing q = k roughly. In contrast, for the bound (2.4), the √ factor in front of σk+1 ranges from 1 + O( n) to 1 + O(n) for any q. We will show that the bounds (2.4) and (2.6) may be fatal overestimates in the context of ill-posed problems. Based on (2.5) and following Jia’s works [18, 19], we carefully analyze the approximation accuracy of QQT A for three kinds of ill-posed problems: severely, moderately and mildly ill-posed problems, and establish much more accurate bounds. Before proceeding, we first give a precise characterization of the degree of illposedness of (1.1) which was introduced in [17] and has been widely used in, e.g., the books [1, 7, 12, 14, 25]. Definition 2.1. If σj = O(ρ−j ), j = 1, 2, . . . , n with ρ > 1, then (1.1) is severely ill-posed. If the singular values σj = O(j −α ), j = 1, 2, . . . , n, then (1.1) is mildly or moderately ill-posed for 21 < α ≤ 1 or α > 1. We mention that the requirement α > 12 does not appear in the aforementioned books but it is added in [18, 19], where it is pointed out that this requirement is naturally met when the kernel of an underlying linear Fredholm equation of the first kind is square integrable over a defined domain. Keep √ in mind that the factors in front of σk+1 in (2.4) and (2.6) lie between 1 + O( n) and 1 + O(n) for a given k. However, for moderately and mildly ill-posed problems, the bounds (2.4) and (2.6) may never be small for k not big and α close to one; for α close to 21 , they are definitely not small as k increases up to n − q − 1. These bounds, if realistic, mean that Algorithm 1 may never generate a meaningful rank-(k + q) approximation to A. Fortunately, as we will show below, the bound (2.6) can be improved substantially for the three kinds of ill-posed problems, and the new bounds indicate that Algorithm 1 (or Algorithm 2) indeed generates very accurate rank-(k + q) approximations to A. Theorem 2.2. For the severely ill-posed problems with σj = O(ρ−j ) and ρ > 1, j = 1, 2, . . . , n, it holds that
(2.7)
T
kA − QQ Ak ≤
s
! √ k 8 k+q −2 1 + 16 1 + + 1 + O(ρ ) σk+1 q+1 q+1
with failure probability at most 3e−q for q ≥ 4 and k = 1, 2, . . . , n − q − 1.
8
Z. JIA AND Y. YANG
Proof. By the assumption on the singular values σj , we obtain
n X
j=k+1
1/2
σj2
= σk+1
n X
j=k+1
1/2 1/2 n X σj2 σj2 = σk+1 1 + 2 2 σk+1 σk+1 j=k+2
n X
= σk+1 1 +
j=k+2
= σk+1 1 + O
1/2
O(ρ2(k−j)+2 )
n X
j=k+2
1/2
ρ2(k−j)+2
−2 1/2 ρ −2(n−k−1) = σk+1 1 + O 1−ρ 1 − ρ−2 1/2 = σk+1 1 + O(ρ−2 ) = σk+1 1 + O(ρ−2 ) .
(2.8)
Substituting (2.8) into (2.4) gives (2.7). Theorem 2.3. For the moderately and mildly ill-posed problems with σj = ζj −α , j = 1, 2, . . . , n, where α > 1/2 and ζ > 0 is some constant, it holds that s r √ α ! k+1 8 k+q k k T (2.9) kA − QQ Ak ≤ 1 + 16 1 + σk+1 + q+1 q+1 2α − 1 k with failure probability at most 3e−q for q ≥ 4 and k = 1, 2, . . . , n − q − 1. Proof. By the assumption on the singular values σj , we obtain
n X
j=k+1
1/2
σj2
= σk+1
= σk+1
n X
j=k+1
1/2 σj2 2 σk+1
n X
j=k+1
j k+1
= σk+1 (k + 1)2α α
< σk+1 (k + 1) (2.10)
= σk+1
n X
j=k+1
Z
k
k+1 k
−2α
∞
α r
1/2
1/2 1
j 2α
1/2 1 dx due to α > x2α
1 2
k . 2α − 1
Substituting (2.10) into (2.5) proves (2.9). From Theorems 2.2–2.3, it is easy to see that the error bounds (2.7) and (2.9) decrease with the oversampling number q. Importantly, whenever we take q = k roughly, the factors in front of σk+1 in (2.7) and (2.9) reduce to O(1), independent of n, provided that α is not close to 21 . On the other side, for a fixed small q ≥ 4, the
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS
9
√ factors in front of σk+1 in (2.7) and (2.9) are 1 + O( k) and 1 + O(k) for α not close to one, respectively, meaning the rank-l approximation to A may be more accurate for severely ill-posed problems than for moderately and mildly ill-posed problems. For a fixed k, the bigger q, the smaller the bounds (2.7) and (2.9), i.e., the more accurate the rank-k RSVD approximations. As a result, in any event, our new bounds are much sharper than (2.4) and (2.6), and get more insight into the accuracy of rank-l 1 approximations for k not big and α not √ close to 2 , where the factors in front of σk+1 has been shown to lie between 1 + O( n) and 1 + O(n). Finally, we mention that all the results on kA − QQT Ak in this section apply to kA − AQQT k as well, where AQQT is generated by Algorithm 2. 3. TRSVD and error bounds, and the MTRSVD algorithms and their analysis. We consider the MTRSVD method and compute the MTRSVD solutions xL,k as the solutions to the problem (1.12) starting with k = 1. The MTRSVD solutions xL,k are regularized solutions to the general-form regularization problem ek from QQT A (or (1.6). MTRSVD first extracts a rank-k TRSVD approximation A T AQQ ) to A, and then utilizes the LSQR algorithm [29] to iteratively solve the resulting least squares problem at each iteration k in (1.12). This step is called inner iteration. Starting with k = 1, MTRSVD proceeds until a best regularized solution is found at some k = k0 , similarly to the TSVD method and the MTSVD method described in the introduction. eΣ e Ve T to A. Let Recall that Algorithm 1 generates a rank-l approximation U and
e = (e U u1 , u e2 , . . . , u el ) ∈ Rm×l ,
Ve = (e v1 , ve2 , . . . , vel ) ∈ Rn×l
e = diag(e Σ σ1 , σ e2 , . . . , σ el ) ∈ Rl×l
with σ e1 ≥ σ e2 ≥ · · · ≥ σ el > 0. Take
(3.1) and
and form (3.2)
ek = (e U u1 , u e2 , . . . , u ek ) ∈ Rm×k ,
Vek = (e v1 , e v2 , . . . , e vk ) ∈ Rn×k ,
e k = diag(e Σ σ1 , σ e2 , . . . , σ ek ) ∈ Rk×k , e k VekT , ek = U ek Σ A
eΣ e Ve T = QQT A and is called a rank-k which is the best rank-k approximation to U TRSVD approximation to A. Halko et al. [11] prove the following basic result. ek be the rank-k approximation to A defined by (3.2). Then Theorem 3.1. Let A the approximation error is (3.3)
ek k ≤ σk+1 + kA − QQT Ak. kA − A
The bound (3.3) reflects the worse case. In Remark 9.1, Halko et al. [11] point out that ”In the randomized setting, the truncation step appears to be less damaging than the error bound of Theorem 9.3 (i.e., (3.3) here) suggests, but we currently lack
10
Z. JIA AND Y. YANG
a complete theoretical understanding of its behavior.” That is to say, the first term σk+1 in (3.3) is generally conservative and may be reduced substantially. Keep in mind that A has s singular values σi with s = min{m, n}. Jia [20] has improved (3.3) and derived sharper bounds, which explain why (3.3) may be an overestimate, as shown below. ek be the rank-k TRSVD approximation to A defined by Theorem 3.2. Let A (3.2). Then it holds that (3.4)
ek k ≤ σ kA − A ˜k+1 + kA − QQT Ak,
where σ ˜k+1 is the (k + 1)-th singular value of QT A and satisfies (3.5)
σm−q+1 ≤ σ ˜k+1 ≤ σk+1
with the definition σn+1 = · · · = σm = 0. Analogously, for the rank-k TRSVD ek constructed by Algorithm 2, it holds that approximation A (3.6)
ek k ≤ σ kA − A ˜k+1 + kA − AQQT k,
where σ ˜k+1 is the (k + 1)-th singular value of AQ and satisfies (3.7)
σn−q+1 ≤ σ ˜k+1 ≤ σk+1
with the definition σm+1 = · · · = σn = 0. Particularly, the two inequalities ”≤” become strict ” n. Once σ ˜k+1 < σk+1 considerably, the first term of (3.4) is negligible relative to the second term, and we will approximately have ek k ≈ kA − QQT Ak. kA − A
Regarding the MTRSVD solution xL,k , we can establish the following result. ek denote the rank-k TRSVD approximation to A generated Theorem 3.3. Let A by Algorithm 1 or Algorithm 2. Then the solution to (1.12) can be written as (3.8)
† xL,k = xk − L(In − Vek VekT ) Lxk ,
where xk is the minimum 2-norm solution to the least squares problem (3.9)
ek x − bk. min kA
x∈Rn
Proof. Following Eld´en [5], we have (3.10) Noting (3.2), we have
e† A e e† xL,k = (In − (L(In − A k k )))Ak b. e† A e e eT A k k = Vk Vk .
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS
11
Substituting the above into (3.10), we obtain (3.8). † Let zk = L(In − Vek VekT ) Lxk . Then zk is the minimum 2-norm solution to the least squares problem (3.11)
min kL(In − Vek VekT )z − Lxk k.
z∈Rn
We must point out that the problem (3.11) and its coefficient matrix are different from those in the MTSVD method in which the coefficient matrix is LVn−k with Vn−k = (vk+1 , . . . , vn ) available from the SVD (2.1) of A. Because of the large size of L(In − Vek VekT ), we suppose that the problem (3.11) can only be solved by iterative solvers. We will use the LSQR algorithm [29] to solve the problem. In order to make full use of the sparsity of L itself and reduce the computational cost and memory, it is vital to avoid forming the dense matrix L(In − Vek VekT ) explicitly within LSQR. Notice that the only action of L(In − Vek VekT ) in the Lanczos diagonalization process and LSQR is to form the products of it and its transpose with vectors. We propose Algorithm 3, which efficiently implements the Lanczos bidiagonalization process with the starting vector u b1 = Lxk /kLxk k. Algorithm 3 b k-step Lanczos bidiagonalization process on L(In − Vek VekT ) 1. Taking u b1 = Lxk /kLxk k, w1 = LT u b1 , g1 = VekT u b1 and define β1 b v0 = 0. b 2. For j = 1, 2, . . . , k pb = wj − Vek (VekT wj ) − βj vbj−1 αj = kb pk; vbj = pb/αj rb = Lb uj − L(Vek gj ) − αj u bj βj+1 = kb rk; u bj+1 = rb/βj+1 wj+1 = LT u bj+1 ; gj+1 = VekT u bj+1
We now consider the solution of (3.11) using LSQR. Suppose (3.12) Ve = Vek Ven−k ∈ Rn×n is an orthogonal matrix. It is then direct to obtain
T . L(In − Vek VekT ) = LVen−k Ven−k
T are Since Ven−k is column orthonormal, the nonzero singular values of LVen−k Ven−k identical to the singular values of LVen−k . As a result, we have
(3.13)
T ) = κ(LVen−k ). κ(L(In − Vek VekT )) = κ(LVen−k Ven−k
Next, we cite a lemma [8, p.78] and exploit it to investigate how the conditioning of (3.11) changes as k increases. Lemma 3.4. If B ∈ Rm×n , m > n and c ∈ Rm , then (3.14) σmax ( B c ) ≥ σmax (B), (3.15) σmin ( B c ) ≤ σmin (B).
12
Z. JIA AND Y. YANG
This lemma shows that if a column is added to a rectangular matrix then the largest singular value increases and the smallest singular value decreases. Therefore, we directly obtain the following result on the conditioning of (3.11). Theorem 3.5. Let the matrix Ven−k be defined by (3.12). Then for p ≥ n − k, we have (3.16)
κ(LVen−k ) ≥ κ(LVen−(k+1) ),
k = 1, 2, . . . , n − 1.
In practice, for L defined by (1.8)–(1.10), the condition p ≥ n − k in Theorem 3.5 is naturally satisfied, except for L2 that requires that k ≥ 2. Consequently, for all L in (1.8)–(1.9), (3.13) and Theorem 3.5 mean (3.17)
T )), κ(L(In − Vek VekT )) ≥ κ(L(In − Vek−1 Vek−1
k = 1, 2, . . . , n − 1.
except for L2 that requires k ≥ 2. This result indicates that, when applied to solving (3.11), the LSQR algorithm generally converges faster with k by recalling that the convergence factor of LSQR is
en−k )+1 κ(LV en−k )−1 ; κ(LV
see [2, p.291]. Particularly, in exact
arithmetic, LSQR will find the exact solution zk of (3.11) after at most n−k iterations. Having done the above, we can present our MTRSVD algorithm for the m ≥ n case, named as Algorithm 4. Algorithm 4 (MTRSVD) Given A ∈ Rm×n (m ≥ n) and l = k + q < n and q ≥ 4, compute the solution xL,k of (1.12). ek to A: A ek = 1: Use Algorithm 1 to compute the rank-k TRSVD approximation A T e e e Uk Σk Vk . 2: Compute the the minimum 2-norm solution xk to (3.9). 3: Compute the solution zk to (3.11) by LSQR. 4: Compute the solution xL,k , defined by (3.8), to the problem (1.12). For the m ≤ n case, making use of Algorithm 2, we present Algorithm 5, a variant of Algorithm 4. Algorithm 5 (MTRSVD) Given A ∈ Rm×n (m ≤ n) and l = k + q < n and q ≥ 4, compute the solution xL,k of (1.12). ek to A: A ek = 1: Use Algorithm 2 to compute the rank-k TRSVD approximation A e k Ve T . ek Σ U k 2: Compute the minimum 2-norm solution xk to (3.9). 3: Compute the solution zk to (3.11) by LSQR. 4: Compute the solution xL,k , defined by (3.8), to the problem (1.12). For a given oversampling parameter q ≥ 4, Algorithms 4–5 have only one parameter k to be determined for finding a best possible regularized solution xL,k . In contrast, the algorithms of Wei et al. [31] and Xiang and Zou [33] involve two parameters: the column number l of the Gaussian random matrix and the optimal Tikhonov regularization parameter λ, which need two kinds of parameter choice methods. Wei et al. [31] use Algorithm 4.2 in [11] to determine l adaptively until kA − QQT Ak ≤ εe
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS
13
is satisfied. In the numerical experiments, they take a fixed εe = 10−2 for all the test problems and noise levels. However, they emphasize that the choice of an optimal tolerance is an open problem. As a matter of fact, the size of εe is problem and noise level dependent, and it is impossible to design a general fixed εb for all problems and noise levels. A basic fact is that the smaller the noise level, the more dominant SVD components of A are needed [12, 14] to form best regularized solutions. This means that the smaller the noise level, the smaller εe must be. The TSVD and MTSVD methods determine their regularization parameters k using some parameter-choice methods, such as GCV and the L-curve criterion. The optimal k0 relies on the noise level, no matter whether or not it is known. We need to determine the optimal regularization parameter k0 in the MTRSVD algorithms. It is crucial to realize that, just as in the TSVD method, the parameter k plays the role of the regularization parameter in the MTSVD, TGSVD, TRGSVD and MTRSVD methods. Each method of them must exhibit semi-convergence [12, 14, 28]: reg the error kL(xreg k − xtrue )k decreases (or correspondingly, kLxk k steadily increases) with respect to k in the first stage until some step k = k0 and then starts to increases (or kLxreg k k starts to increase considerably) after k > k0 . Such k0 is exactly an optimal k0 , at which the regularized solution xreg k0 is most accurate and the best possible one obtained by the MTRSVD algorithms. Specifically, as far as Algorithms 4–5 are concerned, given an oversampling parameter q, they proceed from k = 1 onwards, successively increment l = k + q and expand Q. The algorithms compute a sequence of regularized solutions, and then stop after a very few steps of the occurrence of semi-convergence. In practice, when there is no information about the noise level, the L-curve criterion is a good choice to determine the above optimal k0 [12, 14]. Its implementation is route in our current context for all the algorithms under consideration, so we do not need to repeat any detail. 4. Numerical examples. In this section, we report numerical experiments to demonstrate that the MTRSVD algorithms can compute regularized solutions as accurately as the standard TGSVD method and at least as accurately as those obtained by the RGSVD in [31] and [33]. For some test problems our regularized solutions are substantially better than those by RGSVD. We choose examples from Hansen’s regularization toolboxs [13] and [15]. We generated the Gaussian noise vectors e whose . We entries are normally distributed with mean zero. Define the noise level ε = kek kb bk use ε = 10−3 , 10−4 . To simulate exact arithmetic, the full reorthogonalization is used during the Lanczos bidiagonalization process. We choose L = L1 and L3 defined by (1.8) and (1.10), respectively. For L = L2 , we find that the results and comparisons are very similar to those for L = L1 , so we omit the reports on L = L2 . For each k, we define the relative error by kL(xreg k − xtrue )k . kLxtrue k The TRGSVD algorithms in [31] and [33] are denoted by weirgsvd and xiangrgsvd, respectively. Here we make some non-essential modifications on their algorithms in order to more conveniently compare the algorithms under consideration. The original RGSVD algorithms in [31, 33] are the combinations of RGSVD and Tikhonov regularization. We now truncate rank-l RGSVD and obtain a rank-k truncated randomized GSVD (TRGSVD), leading to the corresponding TRGSVD algorithms. The
14
Z. JIA AND Y. YANG
original RGSVD algorithms with Tikhonov regularization and the current TRGSVD algorithms are the same in the spirit of the TGSVD algorithm and the GSVD with Tikhonov regularization [12, 14], and they will generate the best regularized solutions with essentially the same accuracy. In the tables to be presented, we will list the given oversampling parameter q and the optimal regularization parameter k0 in the braces. We adopt the LSQR algorithm to solve the least squares problems (3.11) with the default relative error 10−6 and the maximum number of inner iterations is min{p, n−k} = n−k. We will find that at step k the actual number of inner iterations is generally much smaller than n − k. All the computations are carried out in Matlab R2015b 64-bit on Intel Core i32120 CPU 3.30GHz processor and 4 GB RAM. 4.1. The m ≥ n case. We first present the results on some one dimensional test problems from Hansen’s regularization toolbox [13], and then report the results on a two dimensional test problem from Hansen’s regularization toolbox [15]. 4.1.1. The one dimensional case. All test problems arises from the discretization of one dimensional first kind Fredholm integral equations Z b k(s, t)x(t)dt = f (s), c ≤ s ≤ d. (4.1) a
For each problem we use the code of [13] to generate A, the true solution xtrue and noise-free right-hand side bb. The description of the test problems is in Table 1, where we choose the parameter ”example = 2” for the test problem deriv2. Table 1 The description of test problems.
Problem shaw foxgood gravity heat phillips deriv2
Description One-dimensional image restoration model Test problem with a discontinuous solution One-dimensional gravity surveying problem Inverse heat equation Phillips ”famous” test problem Computation of second derivative
Ill-posedness severe severe severe moderate moderate mild
In Table 2, we display the relative errors of the best regularized solutions by tgsvd, weirgsvd and mtrsvd with L = L1 and ε = 10−3 , 10−4 , respectively, where tgsvd, weirgsvd and mtrsvd are the standard TGSVD algorithm, the TRGSVD algorithm due to Wei et al. [31] and our Algorithm 4. They illustrate that for all test problems with m = n = 1, 024 the solution accuracy of our mtrsvd algorithm is very comparable to that of the TGSVD algorithm and the TRGSVD algorithm. For m = n = 10, 240, the TGSVD algorithm and the TRGSVD algorithm due to Wei et al. are out of memory in our computer, but our algorithm works well and the best regularized solution is more accurate than the corresponding one for m = n = 1, 024. The only exception is foxgood, for which the best regularized solutions have similar accuracy for these two sets of m, n and ε = 10−3 . Importantly, we observe from the table that for each test problem the best regularized solution by each algorithm is correspondingly more accurate for ε = 10−4 than ε = 10−3 . For each algorithm, except for the problems shaw and foxgood where the optimal regularization parameters k0 are the same for ε = 10−3 , 10−4 , for each of the other problems, the optimal regularization
15
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS Table 2 The comparison of Algorithm 4 (mtrsvd) and the others with L = L1 and ε = 10−3 , 10−4 .
ε = 10−3 m = n = 1, 024 shaw foxgood gravity heat phillips deriv2
q
tgsvd
weirgsvd
mtrsvd
9 9 7 8 15 6
0.1681(8) 0.2743(3) 0.2660(10) 0.1664(30) 0.0479(13) 0.3341(11)
0.1681(9) 0.2743(4) 0.2675(11) 0.1673(31) 0.0480(14) 0.3360(12)
0.1681(9) 0.2742(4) 0.2660(11) 0.1623(29) 0.0401(12) 0.3462(12)
ε = 10−4 m = n = 1, 024 shaw foxgood gravity heat phillips deriv2
q
tgsvd
weirgsvd
mtrsvd
9 9 4 12 11 9
0.1221(8) 0.2740(3) 0.2007(14) 0.1293(44) 0.0263(16) 0.2455(19)
0.1221(9) 0.2740(4) 0.2069(15) 0.1272(46) 0.0263(17) 0.2535(19)
0.1221(9) 0.2740(4) 0.2057(15) 0.1269(46) 0.0263(17) 0.2579(18)
m = n = 10, 240 weitrgsvd -
mtrsvd 0.1428(9) 0.2757(4) 0.2532(11) 0.1399(36) 0.0345(18) 0.2916(16)
m = n = 10, 240 weitrgsvd -
mtrsvd 0.1216(9) 0.2194(6) 0.1996(15) 0.1150(55) 0.0216(25) 0.2358(23)
parameter k0 is bigger for ε = 10−4 than for ε = 10−3 , as expected. This justifies that the smaller the noise level ε is, the more SVD dominant components of A are needed to form best regularized solutions. Finally, as is seen, for each problem and the given ε, the optimal k0 are almost the same for all the algorithms. In Table 3, we display the relative errors of the best regularized solutions by tgsvd, weirgsvd and mtrsvd with L = L3 and ε = 10−3 , 10−4 , respectively. The results and performance evaluations on the three algorithms are analogous to those for L = L1 , and the details are thus omitted. Figure 1 depicts the convergence processes of TGSVD, TRGSVD and MTRSVD as k increases for the four test problems with L = L3 , ε = 10−3 and m = n = 1, 024. We can see that three algorithms have very similar convergence processes and the relative errors of regularized solutions obtained by MTRSVD are almost identical to those by TGSVD and TRGSVD as k increases until the occurrence of semiconvergence. For the other problems, we have observed similar phenomena. These indicate that the three algorithms have the same or highly competitive regularizing effects. Figure 2 depicts the number of inner iterations versus the parameter k. From it, we observe that the number of inner iterations exhibits a considerable decreasing tendency as k increases for the chosen test problems with L = L3 , ε = 10−3 and m = n = 1, 024. LSQR becomes substantially more efficient with k increasing. For L = L1 , we have similar findings. A distinction is that for each problem LSQR uses fewer inner iterations to converge for L1 than for L3 . 4.1.2. The two dimensional case. In this subsection, we test the problem seismicwavetomo which is from [15] and creates a two-dimensional seismic tomography.
16
Z. JIA AND Y. YANG Table 3 The comparison of Algorithm 4 (mtrsvd) and the others with L = L3 and ε = 10−3 , 10−4 .
ε = 10−3 m = n = 1, 024 shaw foxgood gravity heat phillips deriv2
m = n = 10, 240
q
tgsvd
weirgsvd
mtrsvd
weitrgsvd
4 12 8 12 16 8
0.1694(8) 0.3568(4) 0.2838(10) 0.1626(30) 0.0439(12) 0.3465(10)
0.1694(8) 0.3567(4) 0.2830(10) 0.1616(30) 0.0441(12) 0.3499(10)
0.1694(8) 0.3392(4) 0.2811(10) 0.1610(30) 0.0424(12) 0.3550(10)
-
ε = 10−4 m = n = 1, 024 shaw foxgood gravity heat phillips deriv2
0.1431(9) 0.1438(4) 0.1789(9) 0.1468(35) 0.0361(14) 0.3129(13)
m = n = 10, 240
q
tgsvd
weirgsvd
mtrsvd
4 13 4 12 8 8
0.1228(9) 0.2739(5) 0.2143(14) 0.1415(36) 0.0196(20) 0.2692(21)
0.1228(9) 0.2739(5) 0.2143(14) 0.1419(36) 0.0253(19) 0.2697(21)
0.1228(9) 0.2525(5) 0.2126(14) 0.1423(36) 0.0252(19) 0.2717(21)
10 12
weitrgsvd
mtrsvd
-
0.1222(9) 0.1198(9) 0.1516(15) 0.1192(54) 0.0199(21) 0.0201(23)
1 0.9
mtrsvd weirgsvd tgsvd
10 10
mtrsvd weirgsvd tgsvd
0.8 0.7
10 8
0.6
Relative error
Relative error
mtrsvd
10 6 10 4
0.5 0.4
0.3 10 2 10 0 10 -2
0.2
0
2
4
6
8
10
12
14
16
18
0
5
10
Truncation number k
15
20
25
30
35
40
45
Truncation number k
(a)
(b)
10 1 mtrsvd weirgsvd tgsvd
mtrsvd weirgsvd tgsvd
Relative error
Relative error
10 0
10 0
10 -1
10 -2
0
5
10
15
20
Truncation number k
(c)
25
30
0
5
10
15
20
25
30
Truncation number k
(d)
Fig. 1. The semi-convergence processes of TGSVD, TRGSVD and MTRSVD for the four test problems with L = L3 , ε = 10−3 and m = n = 1, 024: (a) shaw; (b) heat; (c) phillips; (d) deriv2.
17
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS 1100
1100 1000
1000
Numbers of inner iterations
Numbers of inner iterations
900 900
800
700
600
800 700 600 500 400 300
500 200 400
100 0
2
4
6
8
10
12
14
16
18
0
5
10
15
Truncation number k
20
25
30
35
40
45
14
16
18
Truncation number k
(a)
(b) 1100
1100 1000
1000
Numbers of inner iterations
Numbers of inner iterations
900 800 700 600 500 400 300
900
800
700
600
200 100
500 0
5
10
15
20
Truncation number k
(c)
25
30
0
2
4
6
8
10
12
Truncation number k
(d)
Fig. 2. The numbers of inner iterations versus k for Algorithm 4 with L = L3 , ε = 10−3 and m = n = 1, 024: (a) shaw; (b) heat; (c) phillips; (d) deriv2.
We use the code of [15] to generate an ps × N 2 coefficient matrix A, the true solution xtrue and noise-free right-hand side bb. We take N = 32, 70, 100 with default s = N and p = 2N , respectively, that is, we generate A ∈ R2048×1024 , R9800×4900 , R20000×10000 . Table 4 shows the relative errors of the best regularized solutions obtained by TGSVD, TRGSVD and MTRSVD with L = L1 , L3 and ε = 10−3 , 10−4 , respectively. Obviously, the relative errors of the best regularized solutions by our algorithm for seismicwavetomo are at least as accurate as those by the TGSVD algorithm and the TRGSVD algorithm in [31] for the two given ε. We next investigate how our algorithm behaves as the oversampling parameter q varies for this problem with m = 20, 000 and n = 10, 000. We mention that the TGSVD and TRGSVD algorithms are out of memory for this problem. Table 5 shows the relative errors of the best regularized solutions obtained by MTRSVD for varying q with L = L1 , L3 and ε = 10−3 , 10−4 , respectively. As we can see, the relative errors of the best regularized solutions by our algorithm for seismicwavetomo decrease a little bit with the oversampling parameter q. In other words, the bigger q, the bigger the optimal truncation parameter k0 and the more accurate the best regularized solution may be. This confirms our theory that bigger q could generate more accurate rank-k approximation to A, so that the regularized solutions could be more accurate. Figures 3–6 draw the convergence processes of TGSVD, RGSVD and MTRSVD as k increase and the inner iterations versus the parameter k with ε = 10−3 and L = L1 , L3 , respectively. We can see that the best regularized solutions by our MTRSVD algorithm are more accurate than the counterparts by the TGSVD algorithm and the TRGSVD algorithm, especially for L = L3 . The number of inner iterations decreases substantially as k increases. Compared with the results on the
18
Z. JIA AND Y. YANG
1
1200 mtrsvd weirgsvd tgsvd
0.9
1000
Numbers of inner iterations
0.8
Relative error
0.7 0.6
0.5
0.4
800
600
400
200
0.3
0 0
100
200
300
400
500
600
700
0
100
200
Truncation number k
300
400
500
600
700
Truncation number k
Fig. 3. The relative errors of TGSVD, TRGSVD and MTRSVD, and inner iterations versus k of MTRSVD for the problem seismicwavetomo of m = 2, 048, n = 1, 024 with L = L1 and ε = 10−3 . 1100 mtrsvd weirgsvd tgsvd
0.9
1000 900
Numbers of inner iterations
0.8
Relative error
0.7
0.6
0.5
800 700 600 500 400 300
0.4 200 100 0
100
200
300
400
500
600
700
0
100
200
Truncation number k
300
400
500
600
700
Truncation number k
Fig. 4. The relative errors of TGSVD, TRGSVD and MTRSVD, and inner iterations versus k for the problem seismicwavetomo of m = 2, 048, n = 1, 024 with L = L3 and ε = 10−3 . 1
5000 mtrsvd weirgsvd tgsvd
0.95
4500
Numbers of inner iterations
4000
Relative error
0.9
0.85
0.8
0.75
3500 3000 2500 2000 1500 1000 500
0.7
0 0
200
400
600
800
1000
1200
1400
0
200
400
Truncation number k
600
800
1000
1200
1400
Truncation number k
Fig. 5. The relative errors of TGSVD, TRGSVD and MTRSVD, and inner iterations versus k of MTRSVD for the problem seismicwavetomo of m = 9, 800, n = 4, 900 with L = L1 and ε = 10−3 . 5000
1 mtrsvd weirgsvd tgsvd
0.98 0.96
4500
Numbers of inner iterations
4000
Relative error
0.94 0.92 0.9 0.88 0.86 0.84
3500 3000 2500 2000 1500 1000 500
0.82
0 0
200
400
600
800
Truncation number k
1000
1200
1400
0
200
400
600
800
1000
1200
1400
Truncation number k
Fig. 6. The relative errors of TGSVD, TRGSVD and MTRSVD, and inner iterations versus k of MTRSVD for the problem seismicwavetomo of m = 9, 800, n = 4, 900 with L = L3 and ε = 10−3 .
19
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS Table 4 The relative errors for seismicwavetomo of m = 2, 048, n = 1, 024.
L = L1 ε
tgsvd
q −3
10 10−4
L = L3
weirgsvd
mtrsvd
tgsvd
weitrgsvd
mtrsvd
42 0.3117(658) 0.3083(603) 0.2982(586) 0.3777(623) 0.3691(633) 0.3451(604) 36 0.1575(810) 0.1567(805) 0.1533(777) 0.1731(864) 0.1722(828) 0.1577(760) The relative errors for seismicwavetomo of m = 9, 800, n = 4, 900 L = L1
ε
tgsvd
q −3
10 10−4
L = L3
weirgsvd
mtrsvd
tgsvd
weitrgsvd
mtrsvd
71 0.6916(1129) 0.6971(1036) 0.6920(1028) 0.8061(1171) 0.8092(1121) 0.8029(1031) 199 0.5526(1801) 0.5519(1859) 0.5428(1811) 0.6594(1951) 0.6593(1840) 0.6476(1842) The relative errors for seismicwavetomo of m = 20, 000, n = 10, 000 L = L1
ε −3
10 10−4
L = L3
q
mtrsvd
q
mtrsvd
951 158
0.7766(1249) 0.6747(2142)
53 149
0.8949(1147) 0.8023(2051)
Table 5 The relative errors for seismicwavetomo of m = 20, 000, n = 10, 000.
L = L1 , ε = 10−3
L = L3 , ε = 10−4
q
mtrsvd
q
mtrsvd
10 587 951
0.7907(1110) 0.7802(1213) 0.7766(1249)
12 72 240
0.8014(2033) 0.7985(2078) 0.7936(2160)
one dimensional problems, however, we observe a remarkable difference that the optimal regularization parameters k0 now become much bigger and this two dimensional ill-posed problem is much harder to solve. The cause is that for this 2-D problem its discrete Picard condition, i.e., the Fourier coefficients |uTi ˆb| do not decay considerably faster than the singular values σi . This means that a good regularized solution must include many SVD components of A. 4.2. The m ≤ n case. We test Algorithm 5, TGSVD, and the two TRGSVD algorithms named as xiangrgsvd [33], weirgsvd [31] on the test problems in Table 1. In Table 6, we display the relative errors of the best regularized solutions obtained by tgsvd, weirtgsvd, xiangrgsvd and mtrsvd, i.e., Algorithm 5, with L = L1 and ε = 10−3 , 10−4 , respectively. The results indicate that when m = n = 1, 024 our MTRSVD algorithm computes the best regularized solution with similar accuracy to those by the TGSVD algorithm and the TRGSVD algorithms weirgsvd and xiangrgsvd for severely and moderately ill-posed problems but the solution accuracy by MTRSVD is much higher than that by xiangrgsvd and weirgsvd for the mildly ill-posed problem deriv2. Actually, the best regularized solutions by xiangrgsvd and weirgsvd have no accuracy since their relative errors are over 300%!
20
Z. JIA AND Y. YANG Table 6 The comparison of Algorithm 5 (mtrsvd) and the others with L = L1 and ε = 10−3 , 10−4 .
ε = 10−3 m = n = 1, 024 shaw foxgood gravity heat phillips deriv2
m = n = 10, 240
q
tgsvd
xiangrgsvd weirgsvd
mtrsvd
xiangrgsvd mtrsvd
6 8 6 8 5 6
0.1946(6) 0.2783(4) 0.2556(12) 0.1543(30) 0.0482(14) 0.3342(12)
0.1967(7) 0.2759(4) 0.2382(12) 0.1714(30) 0.0509(14) 3.0987(1)
0.1942(7) 0.2743(3) 0.2577(11) 0.1564(29) 0.0413(11) 0.3790(8)
0.1353(9) 0.2711(4) 0.2443(12) 0.1801(32) 0.0444(15) 3.0985(1)
0.1967(7) 0.2759(4) 0.2382(12) 0.1714(30) 0.0509(14) 3.0987(1)
ε = 10−4 m = n = 1, 024 shaw foxgood gravity heat phillips deriv2
0.1311(9) 0.1793(4) 0.2223(12) 0.1523(32) 0.0386(11) 0.3815(8)
m = n = 10, 240
q
tgsvd
xiangrgsvd weirgsvd
mtrsvd
xiangrgsvd mtrsvd
4 8 5 9 10 8
0.1217(9) 0.2172(5) 0.2167(13) 0.1304(45) 0.0252(16) 0.2542(19)
0.1217(9) 0.1975(6) 0.2093(13) 0.1450(44) 0.0371(18) 3.0963(1)
0.1217(9) 0.2283(5) 0.2166(12) 0.1321(44) 0.0281(18) 0.3709(9)
0.0968(10) 0.1957(4) 0.2024(14) 0.2064(46) 0.0259(21) 3.0987(1)
0.1217(9) 0.1975(6) 0.2093(13) 0.1450(44) 0.0371(18) 3.0963(1)
0.0967(10) 0.1475(4) 0.1858(14) 0.1526(46) 0.0189(21) 0.3488(10)
Mathematically, the TRGSVD algorithm weirgsvd [31] is the same as xiangrgsvd [33]. Table 6 confirms that these two algorithms compute the same regularized solutions for m = n = 1, 024. For this reason, we only report the results obtained by xiangrgsvd for m = n = 10, 240. Still, xiangrgsvd fails to solve deriv2 and the relative errors of the best regularized solutions are over 300%, but mtrsvd is successful to obtain good regularized solutions. In Table 7, we display the relative errors of best regularized solutions by all the algorithms with L = L3 and ε = 10−3 , 10−4 , respectively. Clearly, we can observe very similar phenomena to those in Table 6. Figure 7 depicts the curves of convergence processes of all the algorithms for the four test problems shaw, heat, phillips and deriv2 with L = L3 , ε = 10−3 and m = n = 1, 024. Figure 8 does the same job for these four problems with L = L3 , ε = 10−3 and m = n = 10, 240. From the two figures, we can see that for the severely ill-posed problem shaw, the relative errors obtained by MTRSVD are almost identical to those by TGSVD and TRGSVD. For the moderately and mildly ill-posed problems, MTRSVD also behaves like TGSVD, but the best regularized solutions obtained by TRGSVD for the mildly ill-posed problem deriv2 have no accuracy and their relative errors are over 300%. Also, we notice that for the moderately ill-posed problem heat with L = L3 , the best regularized solutions by TRGSVD are much less accurate than those by TGSVD and MTRSVD, especially for ε = 10−3 . We also observe from Figure 9 that the number of the inner iterations decrease as k increases for some chosen test problems when L = L3 , ε = 10−3 and m = n = 1, 024. We see that after a few iterations, LSQR only needs two or three hundreds iterations and even no more than one hundred iterations to achieve the prescribed tolerance.
21
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS Table 7 The comparison of Algorithm 5 (mtrsvd) and the others with L = L3 and ε = 10−3 , 10−4 .
ε = 10−3 m = n = 1, 024 shaw foxgood gravity heat phillips deriv2
m = n = 10, 240
q
tgsvd
xiangrgsvd weirgsvd
mtrsvd
xiangrgsvd mtrsvd
6 6 8 9 13 8
0.1659(8) 0.2965(5) 0.2686(10) 0.1689(31) 0.0492(15) 0.3374(12)
0.1662(8) 0.2962(5) 0.2655(10) 0.2851(30) 0.0550(14) 3.0999(1)
0.1659(8) 0.2729(5) 0.2668(10) 0.1825(28) 0.0405(13) 0.3891(8)
0.1475(9) 0.2696(4) 0.2315(12) 0.1806(35) 0.0620(14) 3.0993(1)
0.1662(8) 0.2962(5) 0.2655(10) 0.2851(30) 0.0550(14) 3.0999(1)
ε = 10−4 m = n = 1, 024 shaw foxgood gravity heat phillips deriv2
m = n = 10, 240
q
tgsvd
xiangrgsvd weirgsvd
mtrsvd
xiangrgsvd mtrsvd
4 8 5 13 12 8
0.1250(9) 0.2667(6) 0.2167(13) 0.1407(36) 0.0277(16) 0.2536(18)
0.1250(9) 0.2655(6) 0.2097(13) 0.1608(36) 0.0291(16) 3.0997(1)
0.1250(9) 0.2516(6) 0.2160(13) 0.1481(36) 0.0273(16) 0.3382(14)
0.0987(10) 0.2700(7) 0.1680(15) 0.1323(54) 0.0481(21) 3.1015(1)
0.1250(9) 0.2655(6) 0.2097(13) 0.1608(36) 0.0291(16) 3.0997(1)
10 10
10
0.1371(9) 0.1342(4) 0.1787(12) 0.1402(35) 0.0390(16) 0.3758(9)
0.0975(10) 0.1337(5) 0.1517(15) 0.1272(54) 0.0389(21) 0.3682(10)
1 0.9
mtrsvd weirgsvd xiangrgsvd tgsvd
8
mtrsvd weirgsvd xiangrgsvd tgsvd
0.8 0.7 0.6
Relative error
Relative error
10 6
10 4
10 2
0.5 0.4
0.3
10 0 0.2 10 -2
0
2
4
6
8
10
12
14
16
18
0
5
10
Truncation number k
15
20
25
30
35
40
Truncation number k
(a)
(b)
10 0
10 2 mtrsvd weirgsvd xiangrgsvd tgsvd
mtrsvd weirgsvd xiangrgsvd tgsvd
Relative error
Relative error
10 1
10 -1
10 0
10 -2
0
5
10
15
20
Truncation number k
(c)
25
30
10 -1
0
5
10
15
20
25
30
Truncation number k
(d)
Fig. 7. The relative errors of Algorithm 5 with L = L3 and ε = 10−3 and m = n = 1, 024: (a) shaw; (b) heat; (c) phillips; (d) deriv2.
22
Z. JIA AND Y. YANG
10 10 0.9
mtrsvd weirgsvd
mtrsvd weirgsvd
0.8
10 8
0.7 0.6
10
Relative error
Relative error
10 6
4
0.5 0.4
0.3
10 2
10 0
10 -2
0.2
0
2
4
6
8
10
12
14
16
18
0
5
10
15
Truncation number k
20
25
30
35
40
45
50
Truncation number k
(a)
(b)
10 0
10 2 mtrsvd weirgsvd
mtrsvd weirgsvd
Relative error
Relative error
10 1
10 -1
10 0
10 -2
0
5
10
15
20
10 -1
25
0
5
10
Truncation number k
15
20
25
Truncation number k
(c)
(d)
1100
1100
1000
1000
900
Numbers of inner iterations
Numbers of inner iterations
Fig. 8. The relative errors of Algorithm 5 with L = L3 and ε = 10−3 and m = n = 10, 240: (a) shaw; (b) heat; (c) phillips; (d) deriv2.
800 700 600 500
900 800 700 600 500 400
400
300
300
200 0
2
4
6
8
10
12
14
16
18
0
5
10
Truncation number k
15
(a)
25
30
35
40
(b)
1100
1100
1000
1000
900
Numbers of inner iterations
Numbers of inner iterations
20
Truncation number k
800 700 600 500 400
900 800 700 600 500 400
300 200
300 0
5
10
15
20
Truncation number k
(c)
25
30
0
5
10
15
20
25
30
Truncation number k
(d)
Fig. 9. Inner iterations versus the truncation parameter k of Algorithm 5 with L = L3 and ε = 10−3 and m = n = 1, 024: (a) shaw; (b) heat; (c) phillips; (d) deriv2.
MTRSVD ALGORITHMS FOR ILL-POSED PROBLEMS
23
5. Conclusion. In this paper, we have proposed two MTRSVD algorithms for solving the overdetermined and underdetermined (1.1) with general-form regularization, respectively. We have established a number of sharp error bounds for the approximation accuracy of randomized approximate SVDs and their rank-k truncated ones. These results have improved the existing bounds substantially and provided strong theoretical supports for the effectiveness of randomized algorithms for solving ill-posed problems. We have considered the conditioning of inner least squares problems and shown that it becomes better conditioned as the regularization parameter k increases. As a consequence, LSQR generally converge faster with k and uses fewer iterations to achieve the prescribed tolerance, which has been confirmed numerically. A practical advantage of MTRSVD is its applicability to truly large scale problems, while TGSVD suits only for small to medium sized problems. For the overdetermined problems, the TRGSVD algorithm in [31], though theoretically good, are practically infeasible since it required to compute the GSVD of the large matrix pair {B, L} with B = QT A ∈ Rl×n and invert the large n × n right singular vector matrix. For the underdetermined problems,the TRGSVD algorithms in [31, 33] lack necessary theoretical supports and may not work well. Numerical experiments have demonstrated that our MTRSVD algorithms work very well and can compute regularized solutions with very similar accuracy to those by the standard TGSVD algorithm and the existing TRGSVD algorithms for the overdetermined (1.1). They behave very like TGSVD and can compute much more accurate solutions than the existing TRGSVD algorithms do for some moderately and mildly ill-posed underdetermined (1.1). REFERENCES [1] R. C. Aster, B. Borchers, and C. H. Thurber, Parameter Estimation and Inverse Problems, Second Edition, Elsevier, New York, 2013. ¨ rck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, PA, 1996. [2] ˚ A. Bjo [3] J. J. M. Cuppen, Calculating the isochromes of ventricular depolarization, SIAM J. Sci. Statist. Comput., 5 (1984), pp. 105–120. [4] M. G. Cox, Data approximation by splines in one and two independent variables, in A. Iserles and M. J. D. Powell(Eds.), The State of the Art in Numerical Analysis, Clarendon Press, Oxford, UK,1987, pp. 111–138. [5] L. Eld´ en, A weigthed pseudoinverse, generalized singular values and constrained least squares problems, BIT, 22 (1982), pp. 487–502. [6] H. W. Engl, Regularization methods for the stable solution of inverse problems, Surveys Math. Indust., 3 (1993), pp. 71–143. [7] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer Academic Publishers, 2000. [8] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 2013. [9] M. Gu, Subspace iteration randomization and singular value problems, SIAM J. Sci. Comput., 37 (2015), pp. A1139–A1173. [10] Y. Gu, W. J. Yu, and Y. H. Li, Efficient randomized algorithms for adaptive low-rank factorizations of large matrices, arXiv:1606.09402, 2016. [11] N. Halko, P. G. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53 (2011), pp. 217–288. [12] P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM, Philadelphia, PA, 1998. , Regularization tools version 4.0 for Matlab 7.3, Numer. Algor., 46 (2007), pp. 189–194. [13] , Discrete Inverse Problems: Insight and Algorithms, SIAM, Philadelphia, PA, 2010. [14] [15] P. C. Hansen and M. Saxild-Hansen, AIR tools–a MATLAB package of algebraic iterative reconstruction methods, J. Comput. Appl. Math., 236 (2012), pp. 2167–2178.
24
Z. JIA AND Y. YANG
[16] P. C. Hansen, T. Sekii, and H. Shibahashi, The modified truncated SVD method for regularizaiton in general form, SIAM J. Sci. Comput., 13 (1992), pp. 1142–1150. [17] B. Hofmann, Regularization for Applied Inverse and Ill-posed Problems, Teubner, Stuttgart, Germany, 1986. [18] Z. Jia, The regularization theory of the Krylov iterative solvers LSQR, CGLS, LSMR and CGME For linear discrete ill-posed problems, arXiv:math.NA/1608.05907, 2016. [19] , The regularization theory of the Krylov iterative solvers LSQR and CGLS for linear discrete ill-posed problems. Part I: the simple singular value case, arXiv: math.NA/1701.05708, 2017. [20] , Regularizing effects of the Krylov iterative solvers CGME and LSMR for linear discrete ill-posed problems with an application to truncated randomized SVD, (2017), manuscript. [21] J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, Applied Mathematical Sciences 160, Springer, 2005. [22] M. E. Kilmer, P. C. Hansen, and M. I. Espanol, A projection-based approach to general-form Tikhonov regualarization, SIAM J. Sci. Comput., 29 (2007), pp. 315–330. [23] E. Liberty, F. Woolfe, P.-G. Martinsson, V. Rokhlin, and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proc. Natl. Acad. Sci. USA, 104 (2007), pp. 20167–20172. [24] K. Miller, Least squares methods for ill-posed problems with a prescribed bound, SIAM J. Math. Anal., 1 (1970), pp. 52–74. [25] J. L. Mueller and S. Siltanen, Linear and Nonlinear Inverse Problems with Practical Applications, SIAM, Philadelpha, PA, 2012. [26] P.-G. Martinsson, V. Rokhlin, and M. Tygert, A randomized algorithm for the decomposition of matrices, Appl. Comput. Harmon. Anal., 30 (2011), pp. 47–68. [27] X. Meng, M. A. Saunders, and M. W. Mahoney, LSRN: A parallel iterative solver for strongly over-or underdetermined systems, SIAM J. Sci. Comput., 36 (2014), pp. C95–C118. [28] E. Natterer, The Mathematics of Computerized Tomography, John Wiley, New York, 1986. [29] C. C. Paige and M. A. Saunders, LSQR: An algorithm sparse linear equations and sparse least squares, ACM Transactions on Mathematical Software, 8 (1982), pp. 43–71. [30] V. Rokhlin, A. Szlam, and M.Tygert, A randomized algorithm for principal component analysis, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 1100–1124. [31] Y. Wei, P. P. Xie, and L. P. Zhang, Tikhonov regularization and randomized GSVD, SIAM J. Matrix Anal. Appl., 37 (2016), pp. 649–675. [32] H. Xiang and J. Zou, Regularization with randomized SVD for large-scale discrete inverse problems, Inverse Probl., 29 (2013), 085008. [33] , Regularization with randomized SVD for large-scale inverse problems with general Tikhonov regularizations, Inverse Probl., 31 (2015), 085008.