ESTIMATION OF THE L-CURVE VIA LANCZOS ... - CiteSeerX

47 downloads 0 Views 178KB Size Report
Ax = b, A ∈ Rm×n, x ∈ Rn, b ∈ Rm,. (1) be a linear system of equations with a very ill-conditioned matrix and a right-hand side vector that is contaminated by ...
ESTIMATION OF THE L-CURVE VIA LANCZOS BIDIAGONALIZATION D. CALVETTI

∗,

G. H. GOLUB

† , AND

L. REICHEL



Abstract. The L-curve criterion is often applied to determine a suitable value of the regularization parameter when solving ill-conditioned linear systems of equations with a right-hand side contaminated by errors of unknown norm. However, the computation of the L-curve is quite costly for large problems; the determination of a point on the L-curve requires that both the norm of the regularized approximate solution and the norm of the corresponding residual vector be available. Therefore, usually only a few points on the L-curve are computed and these values, rather than the L-curve, are used to determine a value of the regularization parameter. We propose a new approach to determine a value of the regularization parameter based on computing an L-ribbon that contains the L-curve in its interior. An L-ribbon can be computed fairly inexpensively by partial Lanczos bidiagonalization of the matrix of the given linear system of equations. A suitable value of the regularization parameter is then determined from the L-ribbon, and we show that an associated approximate solution of the linear system can be computed with little additional work. Key words. Ill-posed problem, regularization, L-curve criterion, Gauss quadrature.

1. Introduction. Let (1)

Ax = b,

A ∈ Rm×n ,

x ∈ Rn ,

b ∈ Rm ,

be a linear system of equations with a very ill-conditioned matrix and a right-hand side vector that is contaminated by errors of unknown size. It is well known that in order to be able to compute a useful approximate solution of (1), the system has to be replaced by a less ill-conditioned nearby system. This replacement is known as regularization. One of the most popular regularization methods is Tikhonov regularization, in which the linear system (1) is replaced by a minimization problem, e.g., (2)

min {kAx − bk2 + µkxk2 },

x∈Rn

where µ ≥ 0 is the regularization parameter. Throughout this paper k · k denotes the Euclidean norm. The solution xµ of (2) also solves the linear system (3)

(AT A + µI)x = AT b,

which shows that (4)

xµ = (AT A + µI)−1 AT b.

A proper choice of the value of the regularization parameter µ is important. If the norm of the error in the right-hand side vector b is known, then the value of µ can be chosen so that the associated solution (4) of (2) satisfies the Morozov discrepancy principle; see, e.g., [18]. However, in many important applications the norm of the ∗ Department of Mathematics, Case Western Reserve University, Cleveland, OH 44106. E-mail: [email protected]. Research supported in part by NSF grant DMS-9404692. † Department of Computer Science, Stanford University, Stanford, CA 94305. E-mail [email protected]. Research supported in part by NSF grant CCR-9505393. ‡ Department of Mathematics and Computer Science, Kent State University, Kent, OH 44242. E-mail: [email protected]. Research supported in part by NSF grants DMS-9404706 and ASC9720221

1

2

D. Calvetti et al.

error in the given right-hand side is not explicitly known, and other approaches to determining a value of the regularization parameter µ have to be employed. Lawson and Hanson [17] observed in the ‘60s that an efficient way to display how the value of the regularization parameter affects the solution xµ of (2) and the residual error b − Axµ is to plot the curve (kxµ k, kb − Axµ k) for µ ∈ (0, +∞). This curve is known as the L-curve, because it is shaped roughly like the letter “L.” Lawson and Hanson [17], and more recently Hansen and O’Leary [13, 15], proposed to choose the value of µ that corresponds to the point (kxµ k, kb − Axµk) at the “vertex” of the “L.” We denote this value by µL . A heuristic motivation for this value of the regularization parameter is that when µ > 0 is “tiny,” the associated solution xµ is likely to be very contaminated by propagated errors due to errors in the right-hand side. On the other hand, when µ is large, the solution xµ of (2) is a poor approximate solution of (1). The choice µ = µL seeks to balance these sources of errors. For many problems the L-curve indeed is roughly L-shaped, however, the “vertex” of the curve is marked more or less depending on the spectral properties of the matrix AT A. It has been observed that the use of a log-log plot can be helpful in enhancing the sharpness of the “vertex”. An analysis of the L-curve and its properties can be found in [13, 15]. Recent work by Engl and Grever [4], Hanke [12] and Vogel [21] point out limitations of the L-curve criterion. Nevertheless, computational experience shows the L-curve criterion to give suitable values of the regularization parameter for many problems. A major difficulty when using the L-curve criterion for determining a value of the regularization parameter for large ill-posed problems is that it is expensive to compute points on the L-curve; the determination of each point requires the solution of a minimization problem (2). For large problems, one therefore typically only computes a few points on or close to the L-curve and uses these points to determine a suitable value of the regularization parameter. Numerical issues related to locating the “vertex” of the L-curve in this manner are discussed by Golub and von Matt [11] and Hansen and O’Leary [15]. An approach that combines iterations by the conjugate gradient method with an optimization method for locating the “vertex” has recently been proposed by Kaufman and Neumaier [16]. This paper proposes a new method for approximately determining the location of the “vertex” of the L-curve. We show how rectangular regions that contain points on the L-curve can be determined fairly inexpensively for several values of the regularization parameter without having to solve the corresponding regularized minimization problems (2). The union of these rectangles determines a ribbon-like region which contains the L-curve in its interior. We refer to this ribbon as the L-ribbon. When the L-ribbon is narrow, it is possible to infer the approximate location of the “vertex” of the L-curve from the shape of the L-ribbon. The L-ribbon is a helpful tool to determine a suitable value of the regularization parameter in an interactive computing environment. The L-ribbon is determined by first computing a partial Lanczos bidiagonalization of the matrix A in (1). The width of the ribbon decreases as the dimensions of the bidiagonal factors increase. It is convenient to determine a suitable width, and thereby a suitable number of steps of the bidiagonalization algorithm, interactively. For other iterative methods that compute a partial Lanczos bidiagonalization of the matrix A, we refer to Bj¨orck [2] and O’Leary and Simmons [19]. In some applications of Tikhonov regularization, the minimization problem (2) is

L-curve estimation

3

replaced by min {kAx − bk2 + µkBxk2 },

(5)

x∈Rn

where the matrix B, for instance, may be a discretization of a differential operator. Bj¨orck [2] and Eld´en [3] describe how the minimization problem (5) can be brought into the form (2). We will therefore only consider Tikhonov regularization problems of the form (2) in the present paper. This paper is organized as follows. In Section 2 we express the coordinates of points on the L-curve in terms of certain matrix functionals of the regularization parameter µ. We can determine upper and lower bounds for the abscissa and ordinate for each point on the L-curve by computing upper and lower bounds for these matrix functionals. Section 3 describes how Gauss and Gauss-Radau quadrature rules can be employed to compute such bounds. The evaluation of these quadrature rules requires that certain bidiagonal matrices be available. Section 4 discusses their computation by the Lanczos bidiagonalization algorithm bidiag1 due to Paige and Saunders [20] applied to the matrix A. In Section 5 we present an algorithm for constructing the L-ribbon and we also show how to use already computed quantities to inexpensively determine an approximate solution of (3) that corresponds to a point in the L-ribbon. A few illustrative numerical examples are presented in Section 6, and Section 7 contains concluding remarks. 2. The L-curve and matrix functionals. Let µ > 0 be a given value of the regularization parameter for Tikhonov regularization and let P (µ) = (kxµ k, kAxµ − bk) be the coordinates of the corresponding point on the L-curve. It follows from (4) that kxµ k2 = xTµ xµ = bT A(AT A + µI)−2 AT b

(6) and (7)

kAxµ − bk2 = (A(AT A + µI)−1 AT b − b)T (A(AT A + µI)−1 AT b − b).

The identity (8)

I − A(AT A + µI)−1 AT = µ(AAT + µI)−1

can be used to simplify the right-hand side of (7), and we obtain kAxµ − bk2 = µ2 bT (AAT + µI)−2 b. Introduce the function φ(t) := (t + µ)−2

(9) and define the matrix functionals (10)

s := µ2 bT φ(AAT )b,

sˆ := (AT b)T φ(AT A)AT b.

Then the coordinates of P (µ) can be expressed in terms of s and sˆ, (11)

kAxµ − bk = s1/2 ,

kxµ k = sˆ1/2 .

4

D. Calvetti et al.

We now proceed to write the quantities s and sˆ as Stieltjes integrals. Introduce the spectral factorizations ˆΛ ˆW ˆ T, AT A = W

AAT = W ΛW T ,

(12) where

ˆ := diag[λ1 , λ2 , . . . , λn ] ∈ Rn×n , Λ

Λ := diag[λ1 , λ2 , . . . , λn , 0, . . . , 0] ∈ Rm×m ,

ˆ ∈ Rn×n , W ˆ TW ˆ = In . Here Ij denotes the identity and W ∈ Rm×m , W T W = Im , W matrix of order j. Let ˆ = [h ˆ 1, h ˆ 2 , . . . , ˆhn ]T := W ˆ T AT b. h

h = [h1 , h2 , . . . , hm ]T := µW T b, Then (13) (14)

T

s = h φ(Λ)h = ˆ T φ(Λ) ˆ= ˆ h sˆ = h

n X

k=1 n X

φ(λk )h2k

+ φ(0)

m X

h2k

=

k=n+1

ˆ2 = φ(λk )h k

k=1

Z

Z



φ(t)dω(t),

−∞



φ(t)dˆ ω (t).

−∞

The discrete measures ω(t) and ω ˆ (t) are non-decreasing step functions with jump discontinuities at the eigenvalues λk and at the origin. We show in the next section how Gauss and Gauss-Radau quadrature rules can be applied to cheaply compute upper and lower bounds for s and sˆ. We note that Golub and von Matt [9] have described how to compute lower and upper bounds for sˆ by Gauss and Gauss-Radau quadrature rules in the context of large-scale constrained least-squares approximation. 3. Gauss quadrature. The quantities s and sˆ defined by (10) depend on the parameter µ. Their computation for many different values of µ is not feasible when the matrix A is very large. However, the computation of lower and upper bounds for s and sˆ for several values of µ can be carried out efficiently by using Gauss and GaussRadau quadrature rules. The application of Gauss-type quadrature rules to compute bounds for certain matrix functionals is well-known; see the papers by Golub, Meurant and Strakos [7, 8] and references therein. Recently, Golub and von Matt [11] used this approach to develop a method different from ours for determining a suitable value of the regularization parameter. We briefly review some known facts about Gauss quadrature rules and their connection with the Lanczos process. Define the inner product induced by the measure ω(t) introduced in (13), Z ∞ n X (15) < f, g >:= f (t)g(t)dω(t) = f (λk )g(λk )h2k = hT f (Λ)g(Λ)h, −∞

n−1 {qk }k=0

and let product, i.e., (16)

k=1

be the family of orthonormal polynomials with respect to this inner

< qk , qj >=



0, 1,

k= 6 j, k = j.

The qk satisfy a three-term recurrence relation of the form (17)

tqk−1 (t) = βk qk (t) + αk qk−1 (t) + βk−1 qk−2 (t),

k = 1, 2, . . . ,

5

L-curve estimation

where q−1 (t) := 0 and q0 (t) :=< 1, 1 >−1/2 . It is well-known that ℓ steps of the Lanczos algorithm applied to the matrix AAT with initial vector b yields the symmetric positive definite or semidefinite tridiagonal matrix   α1 β1   β1 α2 β2     . . . .. .. .. Tℓ :=  (18) ,    βℓ−2 αℓ−1 βℓ−1  βℓ−1 αℓ whose entries are the first 2ℓ − 1 coefficients in the recurrence relation (17); see, e.g., [7, 8]. We assume here that ℓ is sufficiently small to secure that βj > 0 for 1 ≤ j < ℓ. Symmetric tridiagonal matrices with positive subdiagonal elements are often referred to as Jacobi matrices. It is convenient to discuss Gauss quadrature in terms of the matrix Tℓ and modifications thereof. Introduce the spectral factorization Tℓ = Yℓ Θℓ YℓT ,

(19)

where Yℓ ∈ Rℓ×ℓ , YℓT Yℓ = Iℓ and Θℓ = diag[θ1 , θ2 , . . . , θℓ ] with θ1 < θ2 < . . . < θℓ . It is well known that the eigenvalues of Tℓ are the zeros of the orthonormal polynomial qℓ (t), as well as the nodes of the ℓ-point Gauss quadrature rule (20)

Gℓ (f ) :=

ℓ X

f (θk )ηk

k=1

with respect to the measure ω(t) defined in (13), where f is a function defined on the support of ω(t). The weights of the quadrature rule (20) are given by ηk := µ2 kbk2 (eT1 Yℓ ek )2 , where ej denotes the jth axis vector. Thus, Gℓ (f ) = µ2 kbk2

ℓ X

f (θk )(eT1 Yℓ ek )2 = µ2 kbk2 eT1 Yℓ f (Θℓ )YℓT e1

k=1

(21)

= µ2 kbk2 eT1 f (Tℓ )e1 .

Introduce the Cholesky factor of Tℓ ,  ρ1  σ2 ρ2   .. Cℓ :=  (22) .  

 ..

. σℓ−1

ρℓ−1 σℓ

ρℓ

as well as the matrix (23)

C¯ℓ−1 =



Cℓ−1 σℓ eTℓ−1



∈ Rℓ×(ℓ−1) ,

which is made up of the the ℓ − 1 first columns of Cℓ .

     

6

D. Calvetti et al.

In the case of particular interest to us when f = φ, where φ is defined by (9), formula (21) provides an efficient way to evaluate the Gauss quadrature rule when the Cholesky factor Cℓ of the matrix Tℓ is available; see Section 5. We will discuss how to compute the matrix Cℓ directly, without computing Tℓ , in Section 4. We use the matrix C¯ℓ−1 in a representation analogous to (21) of the ℓ-point Gauss-Radau quadrature rule (24)

Rℓ (f ) =

ℓ X

f (θ˜k )˜ ηk

k=1

associated with the measure ω(t) with one assigned node, say θ˜1 , at the origin. Proposition 3.1. The ℓ-point Gauss-Radau quadrature rule (24) with respect to the measure ω(t) and with one node assigned at the origin can be expressed as (25)

T )e1 . Rℓ (f ) = µ2 kbk2 eT1 f (C¯ℓ−1 C¯ℓ−1

Proof. The representation (25) has been shown by Golub and von Matt [10]. We outline the short proof since formula (25) is central for our algorithm. The following formula for the Gauss-Radau rule (24) is analogous to (21), Rℓ (f ) = µ2 kbk2 eT1 f (T˜ℓ )e1 , where (26)

−1 2 T˜ℓ := Tℓ − (αℓ − βℓ−1 eTℓ−1 Tℓ−1 eℓ−1 )eℓ eTℓ ;

T see, e.g., Golub [6] for a proof. Using the representation Tℓ−1 = Cℓ−1 Cℓ−1 and the −1 −1 fact that Cℓ−1 eℓ−1 = ρℓ−1 eℓ−1 , we obtain −1 −T −1 eTℓ−1 Tℓ−1 eℓ−1 = eTℓ−1 Cℓ−1 Cℓ−1 eℓ−1 = ρ−2 ℓ−1 . T Substituting this expression and βℓ−1 = ρℓ−1 σℓ into (26) shows that T˜ℓ = C¯ℓ−1 C¯ℓ−1 and formula (25) follows. Let f be a 2ℓ times differentiable function. Introduce the quadrature error Z ∞ EQℓ (f ) := f (t)dω(t) − Qℓ (f ), −∞

where Qℓ = Gℓ or Qℓ = Rℓ . In the former case, there exists θGℓ , λ1 ≤ θGℓ ≤ λn , such that (27)

f (2ℓ) (θGℓ ) EGℓ (f ) = (2ℓ)!

Z



ℓ Y

(t − θi )2 dω(t),

−∞ i=1

where f (j) denotes the jth derivative of the function f . On the other hand, if Qℓ = Rℓ then there exists θRℓ , λ1 ≤ θRℓ ≤ λn , such that (28)

ERℓ (f ) =

f (2ℓ−1) (θRℓ ) (2ℓ − 1)!

Z



−∞

t

ℓ Y (t − θ˜i )2 dω(t),

i=2

L-curve estimation

7

where we recall that θ˜1 = 0. We refer to Golub and Meurant [7] for more details. Proposition 3.2. Assume that µ > 0 and let φ be defined by (9). Then, for ℓ ≥ 1, (29)

EGℓ (φ) > 0,

ERℓ (φ) < 0.

Proof. Analogous results have been shown, e.g., by Golub and Meurant [7]. The jth derivative of φ is given by φ(j) (t) = (−1)j (j + 1)!(t + µ)−j−2 and it follows that φ(2ℓ−1) (t) < 0 and φ(2ℓ) (t) > 0 for t ≥ 0. Substituting these inequalities into (27) and (28), and using the fact that all the jump discontinuities of ω(t) are on the nonnegative real axis shows (29). It follows from Proposition 3.2 that upper and lower bounds for the quantity s defined by (10) can be determined by replacing the Stieltjes integral in (13) by Gauss and Gauss-Radau quadrature rules. We turn to bounds for the quantity sˆ defined by (10). Let Tˆℓ ∈ Rℓ×ℓ denote the tridiagonal matrix obtained by ℓ steps of the Lanczos process applied to AT A with initial vector AT b, and let Cˆℓ denote the lower bidiagonal Cholesky factor of Tˆℓ . Then analogously to the representation (21), the ℓ-point Gauss quadrature rule with respect to the measure ω ˆ (t) can be written as (30)

ˆ ℓ (φ) = kAT bk2 eT φ(Tˆℓ )e1 . G 1

Similarly to (25), the ℓ-points Gauss-Radau rule associated with the measure ω ˆ (t) with one node assigned at the origin can be evaluated as (31)

¯ˆ ¯ˆ T ˆ ℓ (φ) = kAT bk2 eT1 φ(C R ℓ−1 (C ℓ−1 ) )e1 ,

¯ where the matrix Cˆ ℓ−1 consists of the ℓ−1 first columns of Cˆℓ . We obtain analogously to (29) that (32)

EGˆℓ (φ) > 0,

ERˆ ℓ (φ) < 0.

Thus, lower and upper bounds for the quantity sˆ defined by (10) can be determined by replacing the Stieltjes integral in (14) by the Gauss and Gauss-Radau quadrature rules (30) and (31), respectively. 4. Computation of the bidiagonal matrices. We describe how the lower bidiagonal Cholesky factors Cℓ and Cˆℓ of the tridiagonal matrices Tℓ and Tˆℓ , respectively, can be computed directly without forming Tℓ and Tˆℓ . While the Lanczos algorithm applied to the matrix AAT with initial vector b yields the matrix Tℓ , the closely related bidiagonalization algorithm bidiag1 by Paige and Saunders [20] when applied to the matrix A with initial vector b gives the Cholesky factor Cℓ ; see below. Algorithm 1 (Lanczos Bidiagonalization Algorithm). Input: b ∈ Rm , A ∈ Rm×n , 0 < ℓ < n; ℓ+1 ℓ ℓ Output: {uj }ℓ+1 j=1 , {vj }j=1 , {ρj }j=1 , {σj }j=1 ; T ˜ 1 := A u1 ; ρ1 := k˜ ˜ 1 /ρ1 ; σ1 := kbk; u1 := b/σ1 ; v v1 k; v1 := v for j = 2, 3, . . . , ℓ do ˜ j := Avj−1 − ρj−1 uj−1 ; σj := k˜ ˜ j /σj ; u uj k; uj := u ˜ j := AT uj − σj vj−1 ; ρj := k˜ ˜ j /ρj ; v vj k; vj := v

8

D. Calvetti et al.

end j; ˜ ℓ+1 := Avℓ − ρℓ uℓ ; σℓ+1 := k˜ ˜ ℓ+1 /σℓ+1 ; 2 u uℓ+1 k; uℓ+1 := u We assume that the parameter ℓ in Algorithm 1 is chosen small enough so that ρj > 0 and σj > 0 for 1 ≤ j ≤ ℓ. Then the algorithm determines the n × ℓ matrices Uℓ = [u1 , u2 , . . . , uℓ ] and Vℓ = [v1 , v2 , . . . , vℓ ] with orthonormal columns, as well as a lower bidiagonal matrix Cℓ of the form (22). Our implementation of Algorithm 1 used for the numerical experiments of Section 6 reorthogonalizes the columns of Uℓ and Vℓ in order to secure that they are numerically orthogonal. It follows from the recursion formulas of Algorithm 1 that (33)

AVℓ = Uℓ Cℓ + σℓ+1 uℓ+1 eTℓ ,

AT Uℓ = Vℓ CℓT ,

b = σ1 Uℓ e1 .

Combining these equations shows that (34)

AAT Uℓ = Uℓ Cℓ CℓT + σℓ+1 ρℓ uℓ+1 eTℓ .

T It is straightforward to verify that {uj }ℓ+1 j=1 are Lanczos vectors and Tℓ := Cℓ Cℓ is the tridiagonal matrix (18) obtained when applying ℓ steps of the Lanczos algorithm to the matrix AAT with initial vector b. Further, multiplying the equation on the left in (33) by AT shows that {vj }ℓj=1 are the Lanczos vectors generated by applying ℓ steps of the Lanczos algorithm to the matrix AT A with starting vector AT b. It is easy to see that the corresponding tridiagonal matrix is given by Tˆℓ = C¯ℓT C¯ℓ , where the lower bidiagonal (ℓ + 1) × ℓ matrix C¯ℓ is defined analogously to (23). The lower bidiagonal Cholesky factor Cˆℓ of Tˆℓ can be determined, without forming ¯ ℓ ∈ R(ℓ+1)×ℓ satisfies Q ¯T Q ¯ ℓ = Iℓ ¯ ℓ Cˆ T , where Q Tˆℓ , from the QR-factorization C¯ℓ = Q ℓ ℓ T ˆ ¯ and Cℓ is upper bidiagonal. We note that the matrix Qℓ can be represented by ℓ Givens rotation, and therefore Cˆℓ can be computed from C¯ℓ in only O(ℓ) arithmetic operations.

5. Estimation of the regularization parameter by the L-ribbon. We show how to compute upper and lower bounds for the abscissa and ordinate of one point on the L-curve corresponding to a fixed value of the regularization parameter. Then, we discuss how bounds for the coordinates of other points on the L-curve can be computed without carrying out additional matrix-vector products with the matrices A and AT . This allows us to inexpensively determine a ribbon-like region, the Lribbon, which contains the L-curve in its interior. The ribbon is determined by first computing the bidiagonal matrix Cℓ as well as the coefficient σℓ+1 by Algorithm 1. The width of the ribbon decreases as ℓ increases; when ℓ in Algorithm 1 is sufficiently large, the L-ribbon has zero width and coincides with the L-curve. We also show how to cheaply compute an approximate solution xµ,ℓ of (2) and (3) associated with a desired value of the regularization parameter. Assume for the moment that the value of the regularization parameter µ is given, and let Pµ = (kxµ k, kb − Axµ k) be the associated point on the L-curve. After application of Algorithm 1, we can compute upper and lower bounds for kb−Axµ k by determining upper and lower bounds for the quantity s associated with this value of the regularization parameter; see (11). The latter are computed by approximating the integral in (13) by ℓ-point Gauss-Radau and Gauss quadrature rules. It follows from Proposition 3.2 that (35)

Gℓ (φ1 ) ≤ s ≤ Rℓ+1 (φ1 )

and (21) and (25) yield (36)

Gℓ (φ1 ) = µ2 kbk2 eT1 (Cℓ CℓT + µIℓ )−2 e1 ,

L-curve estimation

9

Rℓ+1 (φ1 ) = µ2 kbk2 eT1 (C¯ℓ C¯ℓT + µIℓ+1 )−2 e1 .

(37)

We evaluate the Gauss quadrature rule (36) by first determining the vector y := (Cℓ CℓT + µIℓ )−1 e1 . This vector is computed as the solution of the least-squares problem

   

CℓT 0 −1/2

, 0 ∈ Rℓ . y − µ min

µ1/2 Iℓ e1 y∈Rℓ

Eld´en [3] and Gander [5] describe how this problem can be solved with the aid of Givens rotations. Note that the matrix Cℓ is independent of the regularization parameter µ. Therefore, given this matrix, the Gauss rule (36) can be evaluated in only O(ℓ) arithmetic floating point operations for each value of µ. The scaling factor kbk in the quadrature rule is computed in Algorithm 1. The evaluation of the Gauss-Radau rule (37) can be carried out similarly. We determine bounds for kxµ k by computing bounds for the quantity sˆ defined in (10). It follows from (32) that, analogously to (35), we have ˆ ℓ (φ), Gˆℓ (φ) ≤ sˆ ≤ R

(38) where

Gˆℓ (φ) = kAT bk2 eT1 (Cˆℓ CˆℓT + µIℓ )−2 e1 , ¯ˆ ¯ˆ T −2 ˆ ℓ (φ) = kAT bk2 eT (C e1 . R ℓ−1 (C ℓ−1 ) + µIℓ ) 1

We evaluate these quadrature rules similarly as (36). The scaling factor kAT bk = ρ1 σ1 is determined in Algorithm 1. Combining (11) and the inequalities (35) and (38) yields the following bounds for the coordinates of the point Pµ = (kxµ k, kb − Axµ k) on the L-curve: (Gℓ (φ))1/2 ≤kb − Axµ k≤ (Rℓ+1 (φ))1/2 , ˆ ℓ (φ))1/2 . (Gˆℓ (φ))1/2 ≤ kxµ k ≤ (R Introduce the quantities y − (µ) := (Gℓ (φ))1/2 , y + (µ) := (Rℓ+1 (φ))1/2 , x− (µ) := (Gˆℓ (φ))1/2 , ˆ ℓ (φ))1/2 , x+ (µ) := (R

(39) (40) (41) (42)

for µ > 0. We define the L-ribbon as the union of the rectangular regions with vertices (39)-(42), [ {{x(µ), y(µ)} : x− (µ) ≤ x(µ) ≤ x+ (µ), y − (µ) ≤ y(µ) ≤ y + (µ)}. µ>0

The following algorithm determines rectangles associated with the parameter values µj , 1 ≤ j ≤ p. Algorithm 2 (L-Ribbon Algorithm). Input: b ∈ Rm , A ∈ Rm×n , ℓ, {µj }pj=1 ; p − p + p − p Output: {x+ j }j=1 , {xj }j=1 , {yj }j=1 , {yj }j=1 ;

10

D. Calvetti et al.

i) Apply Algorithm 1 to compute the entries of the bidiagonal matrix C¯ℓ . Determine Cˆℓ by QR-factorization of C¯ℓ . ii) for j = 1, 2, . . . , p do µ := µj ; % The function φ below depend on µ % ˆ ℓ (φ); Evaluate Gℓ (φ), Gˆℓ (φ), Rℓ+1 (φ), R + − − + xj := x (µj ); xj := x (µj ); yj− := y − (µj ); yj+ := y + (µj ); end j

2

We now show how to inexpensively evaluate an approximate solution of (3) that corresponds to a point in the L-ribbon. Theorem 5.1. Let µ > 0 be a desired value of the regularization parameter and let xµ,ℓ be an associated approximate solution of (3) of the form xµ,ℓ = Vℓ yµ,ℓ determined by the Galerkin equation VℓT (AT A + µIn )Vℓ yµ,ℓ = VℓT AT b.

(43) Then

kxµ,ℓ k = (Gˆℓ (φ))1/2

(44) and

kb − Axµ,ℓ k = (Rℓ+1 (φ))1/2 .

(45)

Proof. Using the formulas (33), we can simplify the Galerkin equations (43) to (46)

(C¯ℓT C¯ℓ + µIℓ )yµ,ℓ = σ1 CℓT e1 ,

and in view of that CℓT e1 = ρ1 e1 and C¯ℓT C¯ℓ = Tˆℓ , we obtain that yµ,ℓ = ρ1 σ1 (Tˆℓ + µIℓ )−1 e1 . Thus, T xTµ,ℓ xµ,ℓ = yµ,ℓ yµ,ℓ = ρ21 σ12 eT1 (Tˆℓ + µIℓ )−2 e1 = Gˆℓ (φ),

where we have used that ρ1 σ1 = kAT bk. This shows (44). We turn to the proof of (45). We may write the right-hand side of (46) as C¯ℓT e1 , which yields the expression (47)

yµ,ℓ = σ1 (C¯ℓT C¯ℓ + µIℓ )−1 C¯ℓT e1 .

Substituting (47) into the right-hand side of kAVℓ yµ,ℓ − bk2 = kUℓ+1 (C¯ℓ yµ,ℓ − σ1 e1 )k2 = kC¯ℓ yµ,ℓ − σ1 e1 k2 yields kAVℓ yµ,ℓ − bk2 = kσ1 C¯ℓ (C¯ℓT C¯ℓ + µIℓ )−1 C¯ℓT e1 − σ1 e1 k2 = µ2 σ12 k(C¯ℓ C¯ℓT + µIℓ )−1 e1 k2 = µ2 kbk2 eT φ(C¯ℓ C¯ T )e1 = Rℓ+1 (φ), 1



L-curve estimation

11

where we have used the identity (8) with A replaced by C¯ℓ and the representation (25) of the (ℓ + 1)-point Gauss-Radau rule. The solution yµ,ℓ of (43) and (46) can be computed as the solution of the leastsquares problem

 

C¯ℓ

(48) y − σ e min 1 1 , 1/2

µ Iℓ y∈Rℓ where σ1 is defined in Algorithm 1. We remark that (46) are the normal equations associated with the problem (48).

6. Computed examples. The computations of this section were carried out with Matlab on an HP9000/777 workstation. We seek to solve the Fredholm integral equation of the first kind Z π π sinh(s) , 0≤s≤ , exp(s cos(t))x(t)dt = 2 s 2 0 considered by Baart [1]. The integral equation is discretized by a Galerkin method with orthonormal box functions. We used the Matlab code by Hansen [14] for computing the nonsymmetric matrix A ∈ R200×200 and right-hand side vector bexact ∈ R200 . A “noise vector” w ∈ R200 is generated by first determining a vector with normally distributed random numbers with zero mean and variance one as components, and then scaling this vector to be of desired length. The right-hand side in the linear system (1) to be solved is defined as b := bexact + w. Example 5.1. The noise vector w is scaled to have norm 1 · 10−3 . We apply Algorithm 2 with values of the regularization parameter µj := 1·10−7+(j−1)/4 , 1 ≤ j ≤ 16. The number of bidiagonalization steps ℓ in the Algorithms 1 and 2 is chosen to be 5. Rectangles of the L-ribbon associated with the values µj generated by Algorithm 2 are displayed in Figure 1. For all values of the regularization parameter µj used, the associated quantities y − (µj ) and y + (µj ) are very close. Therefore the rectangles with vertices {x± (µj ), y ± (µj )} that make up the L-ribbon look like horizontal intervals when the values of x− (µj ) and x+ (µj ) are not very close, and like points when x− (µj ) and x+ (µj ) are close. Rectangles on the left-hand side of the figure are associated with the largest µj and rectangles on the right-hand side with the smallest values µj . The vertices {x− (µj ), y + (µj )} of these rectangles are marked by “x”. These vertices correspond to the approximate solutions xµj ,5 of (2) and (3) discussed in Theorem 5.1. When the rectangular region is “tiny” only the vertex “x” is visible. Figure 1 also displays points on the L-curve associated with the values µj of the regularization parameter. These points are marked by small circles “o”. In the present example, the vertices “x” coincide with the points on the L-curve associated with the same value of the regularization parameter to plotting accuracy. We can locate the “vertex” of the L-curve using Figure 1 and in this way determine a suitable value of the regularization parameter. The figure suggests that µ := µ11 = 1 · 10−9/2 is an appropriate value (the µj are enumerated from the right to the left in the figure). The rectangles of the L-ribbon close to the vertex are “tiny”, and therefore the use of only ℓ = 5 bidiagonalization steps in Algorithm 2 would, in general, yield an approximate solution xµ11 ,5 of (3) of sufficient accuracy. As pointed out in Section 4, this approximate solution is very inexpensive to compute. We envision the L-ribbon to be used in an interactive manner; a user would plot rectangles of the ribbon associated with certain values of the regularization parameter

12

D. Calvetti et al.

Rectangles of the L−ribbon, points "x" in the ribbon and points "o" on the L−curve −2.4

−2.5

−2.7

log

10

µ

|| b−A x ||

−2.6

−2.8

−2.9

−3

−3.1 0.086

0.088

0.09

0.092 log || x || 10 µ

0.094

0.096

0.098

Fig. 1. Example 5.1: ℓ = 5 in Alg. 2 and kwk = 1 · 10−3 .

and a certain number of bidiagonalization steps ℓ. The plotted output would guide in the selection of new values of the regularization parameters, and would show if the number of bidiagonalization steps ℓ in Algorithm 2 needs to be increased. Figure 1 shows the rectangles of the L-ribbon to increase in size for small values of µ, and a more detailed image of this behavior is provided by Figure 2. The latter figure is generated by computing rectangles of the L-curve for µj := 1 · 10−7+(j−1)/40 , 1 ≤ j ≤ 40. Figures 1 and 2 show that the Gauss and Gauss-Radau quadrature rules yield higher accuracy for the larger values µj . This depends on that the integrand (9) has a pole at t = −µj , and these poles are closer to the spectra of AT A and AAT the closer µj is to the origin. The presence of a pole close to the spectra of these matrices reduces the accuracy of the Gauss and Gauss-Radau rules. The size of the rectangles decreases as the number of bidiagonalization steps ℓ increases for a fixed value of the regularization parameter µ. For many problems more than ℓ = 5 bidiagonalization steps are required in order to determine a useful L-ribbon and approximate solution xµj ,ℓ of (3) and (1). Figure 3 displays the differences between x− (µ) and x+ (µ), as well as between − y (µ) and y + (µ), in a different manner. The graphs on the left-hand side show x− (µ) (dashed curves) and x+ (µ) (continuous curves) for ℓ = 4 and ℓ = 5 bidiagonalization steps by Algorithm 1, where x− (µ) is defined by (41) and x+ (µ) by (42). The graphs are log10 -log10 plots. Note that for fixed values of µ the difference between x− (µ) and x+ (µ) decreases as the number of bidiagonalization steps ℓ is increased from 4 to 5.

13

L-curve estimation −3

−4.2

Rectangles of the L−ribbon

x 10

−4.3

−4.4

3+log10|| b−A xµ||

−4.5

−4.6

−4.7

−4.8

−4.9

−5

−5.1 −5

−4.5

−4

−3.5 −0.1+log10|| xµ||

−3

−2.5 −3

x 10

Fig. 2. Example 5.1: ℓ = 5 in Alg. 2 and kwk = 1 · 10−3 .

For the larger values of µ, the curves for x− (µ) and x+ (µ) agree to plotting accuracy. The graphs on the right-hand side of Figure 3 show y − (µ) (dashed curves) and + y (µ) (continuous curves) for ℓ = 4 and ℓ = 5, where y − (µ) is given by (39) and y + (µ) by (40). The graphs are log10 -log10 plots. For the larger values of the regularization parameter, the curves for y − (µ) and y + (µ) agree to plotting accuracy. 7. Conclusion. We have demonstrated that the L-ribbon is a versatile tool for interactive solution of large-scale linear discrete ill-posed problems. The ribbon often is inexpensive to determine, and yields insight both into how to select a suitable value of the regularization parameter, and how many Lanczos steps to carry out for the computation of an approximate solution of (1). ˚ke Bj¨orck for proposing the use of Acknowledgement. We would like to thank A the algorithm bidiag1 and for helpful suggestions. REFERENCES [1] M. L. Baart, The use of auto-correlation for pseudo-rank determination in noisy ill-conditioned least-squares problems, IMA J. Numer. Anal. 2 (1982), pp. 241–247. [2] ˚ A. Bj¨ orck, A bidiagonalization algorithm for solving large and sparse ill-posed systems of linear equations, BIT 18 (1988), pp. 659–670. [3] L. Eld´ en, Algorithms for the regularization of ill-conditioned least squares problems, BIT 17 (1977), pp. 134–145.

14

D. Calvetti et al.

l=4 in Algorithms 1 and 2

l=4 in Algorithms 1 and 2

2

−2.5 −3 y±(µ)

−3.5

log

10

1

log

10

x±(µ)

1.5

0.5

0 −7

−4 −4.5

−6

−5 log µ

−4

−5 −7

−3

−6

10

−5 log µ

−4

−3

10

l=5 in Algorithms 1 and 2

l=5 in Algorithms 1 and 2

0.098

−2.4

0.096 −2.6 y±(µ)

log10 x±(µ)

0.094

log

10

0.092 0.09

0.088 0.086 −7

−2.8

−3 −6

−5 log10 µ

−4

−3

−7

−6

−5 log10 µ

−4

−3

Fig. 3. Example 5.1: Top left-hand side graph shows log10 x− (µ) (dashed curve) and log10 x+ (µ) (continuous curve) for ℓ = 4 bidiagonalization steps with Algorithm 1. Top righthand side graph shows log10 y − (µ) (dashed curves) and log10 y + (µ) (continuous curve) for ℓ = 4 bidiagonalization steps with Algorithm 1. The bottom graphs display the analogous graphs for ℓ = 5 bidiagonalization steps.

[4] H. W. Engl and W. Grever, Using the L-curve for determining optimal regularization parameters, Numer. Math. 69 (1994), pp. 25–31. [5] W. Gander, Least squares with a quadratic constraint, Numer. Math. 36 (1981), pp. 291–307. [6] G. H. Golub, Some modified matrix eigenvalue problems, SIAM Review 15 (1973), pp. 318–334. [7] G. H. Golub and G. Meurant, Matrices, moments and quadrature, in Numerical Analysis 1993, eds. D. F. Griffiths and G. A. Watson, Longman, Essex, England, 1994, pp. 105–156. [8] G. H. Golub and Z. Strakos, Estimates in quadratic formulas, Numer. Algor. 8 (1994), pp. 241–268. [9] G. H. Golub and U. von Matt, Quadratically constrained least squares and quadratic problems, Numer. Math. 59 (1991), pp. 561–580. [10] G. H. Golub and U. von Matt, Generalized cross-validation for large scale problems, J. Comput. Graph. Stat. 6 (1997), pp. 1–34.

L-curve estimation

15

[11] G. H. Golub and U. von Matt, Tikhonov regularization for large scale problems, in Workshop on Scientific Computing, eds. G. H. Golub, S. H. Lui, F. Luk and R. Plemmons, Springer, New York, 1997, pp. 3–26. [12] M. Hanke, Limitations of the L-curve method in ill-posed problems, BIT 36 (1996), pp. 287–301. [13] P. C. Hansen, Analysis of discrete ill-posed problems by means of the L-curve, SIAM Review 34 (1992), pp. 561–580. [14] P. C. Hansen, Regularization tools: A Matlab package for analysis and solution of discrete ill-posed problems, Numer. Algor. 6 (1994), pp. 1–35. Software is available in Netlib at http://www.netlib.org. [15] P. C. Hansen and D. P. O’Leary, The use of the L-curve in the regularization of discrete ill-posed problems, SIAM J. Sci. Comput. 14 (1993), pp. 1487–1503. [16] L. Kaufman and A. Neumaier, Regularization of ill-posed problems by envelope guided conjugate gradients, J. Comput. Graph. Stat. 6 (1997), pp. 451–463. [17] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems, SIAM, Philadelphia, PA, 1995. [18] V. A. Morozov, On the solution of functional equations by the method of regularization, Soviet Math. Dokl. 7 (1966), pp. 414–417. [19] D. P. O’Leary and J. A. Simmons, A bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems, SIAM J. Sci. Stat. Comput., 2 (1981), pp. 474–489. [20] C. C. Paige and M. A. Saunders, LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Software 8 (1982), pp. 43–71. [21] C. R. Vogel, Non-convergence of the L-curve regularization parameter selection method, Inverse Problems, 12 (1996), pp. 535–547.

Suggest Documents