LNLQ: AN ITERATIVE METHOD FOR LEAST-NORM

Cahier du GERAD G-2018-40

1 2

LNLQ: AN ITERATIVE METHOD FOR LEAST-NORM PROBLEMS WITH AN ERROR MINIMIZATION PROPERTY˚

3

RON ESTRIN , DOMINIQUE ORBAN , AND MICHAEL A. SAUNDERS

4 5 6 7 8 9 10 11 12

Abstract. We describe LNLQ for solving the least-norm problem min }x} subject to Ax “ b. Craig’s method is known to be equivalent to applying the conjugate gradient method to the normal T T equations of the second kind (AA y “ b, x “ A y). LNLQ is equivalent to applying SYMMLQ. If an underestimate to the smallest singular value is available, error upper bounds for both x and y are available at each iteration. LNLQ is a companion method to the least-squares solver LSLQ (Estrin, Orban, and Saunders, 2017), which is equivalent to SYMMLQ on the conventional normal equations. We show that the error upper bounds are tight and compare with the bounds suggested by Arioli (2013) for CRAIG. A sliding window technique allows us to tighten the error upper bound in y at the expense of a few additional scalar operations per iteration.

13 14

Key words. Linear least-norm problem, error minimization, SYMMLQ, conjugate gradient method, CRAIG.

:

1. Introduction. We wish to solve the least-norm problem

15 16

(1)

2 1 2 }x}

minimize n xPR

17 18

§

;

subject to Ax “ b,

where } ¨ } denotes the Euclidean norm, A P Rmˆn , and the constraints are assumed to be consistent. Any solution px‹ , y‹ q satisfies the normal equations of the second kind: T

AA y “ b,

T

x“A y

„ Í A

AT

„  „  x 0 “ . y b

19

(2)

20 21

The main objective of this paper is to devise an iterative method and accompanying reliable upper bounds on the errors }xk ´ x‹ } and }yk ´ y‹ }. Existing iterative methods tailored to the solution of (1) include CRAIG (Craig, 1955) and LSQR (Paige and Saunders, 1982a,b). LSQR does not provide any convenient such upper bounds. CRAIG generates iterates xk that are updated along orthogonal directions, so that it is possible to devise an upper bound on the error in x (Arioli, 2013), but does not update the iterates yk along orthogonal directions. CRAIG and LSQR turn out to be formally equivalent to the method of conjugate gradients (CG) (Hestenes and Stiefel, 1952) and MINRES (Paige and Saunders, 1975) applied to (2), respectively, but are more reliable when A is ill-conditioned. By construction, LNLQ is formally equivalent to SYMMLQ applied to (2). LNLQ inherits beneficial properties of SYMMLQ, including orthogonal updates to yk , cheap transfers to the CRAIG point, and cheap upper bounds on the error }yk ´ y‹ }.

22 23 24 25 26 27 28 29 30 31 32

ô

˚

Version of June 15, 2018. Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA. E-mail: [email protected]. ; GERAD and Department of Mathematics and Industrial Engineering, École Polytechnique, Montréal, QC, Canada. E-mail: [email protected]. Research partially supported by an NSERC Discovery Grant. § Systems Optimization Laboratory, Department of Management Science and Engineering, Stanford University, Stanford, CA, USA. E-mail: [email protected]. Research partially supported by the National Institute of General Medical Sciences of the National Institutes of Health, award U01GM102098. :

1

Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

2 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

53 54 55 56 57 58 59

60

61 62 63 64 65 66

R. ESTRIN, D. ORBAN, AND M. A. SAUNDERS

[toc]

Motivation. Linear systems of the form (2) occur during evaluation of the value and gradient of a certain penalty function for equality-constrained optimization (Fletcher, 1973; Estrin, Friedlander, Orban, and Saunders, 2018). Our main motivation is to devise reliable termination criteria that allow control of the error in the solution of (1), thereby allowing us to evaluate inexact gradients cheaply while maintaining global convergence properties of the underlying optimization method. Our approach follows the philosophy of Estrin, Orban, and Saunders (2016) and Estrin et al. (2017) and requires an estimate of the smallest singular value of A. Although such an estimate may not always be available in practice, good underestimates are readily available in many optimization problems, including PDE-constrained problems—see section 7. Arioli (2013) develops an upper bound on the error in x along the CRAIG iterations based on an appropriate Gauss-Radau quadrature (Golub and Meurant, 1997), and suggests the seemingly simplistic upper bound }yk ´ y‹ } ď }xk ´ x‹ }{σr , where σr is the smallest nonzero singular value of A. The remainder of this paper is outlined as follows: Section 2 gives the background on the Golub and Kahan (1965) process and CRAIG. Sections 3–6 derive LNLQ from the Golub and Kahan process, highlight relationships to CRAIG, derive error bounds, and discuss regularization and preconditioning. Numerical experiments are given in section 7. Extensions to quasi-definite systems are given in section 8, followed by concluding remarks in section 9. Notation. We use Householder notation: A, b, β for matrix, vector, scalar, with the exception of c and s denoting scalars that define reflections. All vectors are columns, but the slightly abusive notation pξ1 , . . . , ξk q is sometimes used to enumerate their components in the text. Unless specified otherwise, }A} and }x} denote the Euclidean norm of matrix A and vector x. For symmetric positive definite M , we define the M -norm of u via }u}2M :“ uTM u. We order the singular values of A according to σ1 ě σ2 ě ¨ ¨ ¨ ě σminpm,nq ě 0, and A: denotes the Moore-Penrose pseudoinverse of A. 2. Background. 2.1. The Golub-Kahan process. The Golub and Kahan (1965) process applied to A with starting vector b is described as Algorithm 1. In line 1, β1 u1 “ b is short for “β1 “ }b}; if β1 “ 0 then exit; else u1 “ b{β1 ”. Similarly for line 2 and the main loop. In exact arithmetic, the algorithm terminates with k “ ` ď minpm, nq and either α``1 or β``1 “ 0. Paige (1974) explains that if Ax “ b is consistent, the process must terminate with β``1 “ 0.

Algorithm 1 Golub-Kahan Bidiagonalization Process Require: A, b 1: β1 u1 “ b T 2: α1 v1 “ A u1 3: for k “ 1, 2, . . . do 4: βk`1 uk`1 “ Avk ´ αk uk 5: αk`1 vk`1 “ ATuk`1 ´ βk`1 vk 6: end for Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

LNLQ

[toc]

“ We define Uk :“ u1

67

¨¨¨

‰ “ uk , Vk :“ v1

3

¨¨¨

‰ vk , and

»

»

68

α1 — β2 — Lk :“ — –

(3)

fi α2 .. .

..

ffi ffi ffi , fl

.

βk

αk

α1 — β2 — — Bk :“ — — –

fi α2 .. .

..

.

βk

ffi „  ffi Lk ffi . ffi “ βk`1 eTk ffi fl αk βk`1

69

After k iterations of Algorithm 1, the following hold to machine precision:

70

(4a)

AVk “ Uk`1 Bk , T

A Uk`1 “ Vk BkT ` αk`1 vk`1 eTk`1 “ Vk`1 LTk`1 ,

71 72

(4b)

73 74 75

while the identities UkT Uk “ Ik and VkT Vk “ Ik hold only in exact arithmetic. The next sections assume that these identities do hold, allowing us to derive certain norm estimates that seem reliable in practice until high accuracy is achieved in x and y.

78

2.2. CRAIG. For problem (1), the method of Craig (1955) was originally derived as a form of the conjugate gradient (CG) method (Hestenes and Stiefel, 1952) applied to (2). Paige (1974) provided a description based on Algorithm 1:

79 80

(5)

81 82 83

where tk :“ pτ1 , . . . , τk q and the components of tk can be found recursively from τ1 “ β1 {α1 , τj “ ´βj τj´1 {αj (j ě 2). If we suppose tk “ LTk y¯kC for some vector y¯kC that exists but need not be computed, we see that

84

(6)

76 77

Lk tk “ β1 e1 ,

C xC k :“ Vk tk “ xk´1 ` τk vk ,

T C xC ¯k “ ATUk y¯kC “ ATykC , k “ Vk Lk y

“ ‰ where ykC :“ Uk y¯kC provides approximations to y. If we define Dk “ d1 ¨ ¨ ¨ dk T T 86 from Lk Dk “ Uk , we may compute the vectors dj recursively from d1 “ u1 {α1 , 87 dj “ uj ´ βj dj´1 {αj pj ě 2q and then update 85

C ykC “ Dk LTk y¯kC “ Dk tk “ yk´1 ` τk d k .

88

To see the equivalence with CG on (2), note that relations (4) yield

89 90

(7)

91 92

(8)

AATUk “ AVk LTk “ Uk`1 Bk LTk “ Uk`1 Hk ,  „ Lk LTk , Hk :“ Bk LTk “ αk βk`1 eTk

which we recognize as the result of k iterations of the Lanczos (1950) process applied to AAT with starting vector b, where » fi α ¯ 1 β¯2 — ffi .. — β¯2 α ffi . ¯2 T — ffi Tk :“ Lk Lk “ — 95 (9) ffi . . . . . . β¯ fl – k β¯k α ¯k 93 94


4


[toc]

98

is the Cholesky factorization of the Lanczos tridiagonal Tk , with α ¯ 1 :“ α12 and 2 2 ¯ C T C α ¯ j :“ αj ` βj , βj :“ αj βj`1 for j ě 2. Note that Tk y¯k “ Lk Lk y¯k “ Lk tk “ β1 e1 . CG defines ykC “ Uk y¯kC , and so we have the same iterates as CRAIG:

99

T C T xC ¯kC “ Vk LTk y¯kC “ Vk tk “ xC k “ A yk “ A Uk y k´1 ` τk vk .

96 97

100 101 102

Note that whereas Dk is not orthogonal, xC k in (5) is updated along orthogonal directions and k ÿ 2 }xC τj2 , k} “ j“1

113

i.e., is monotonically increasing and }x‹ ´ xC k } is monotonically decreasing. Arioli (2013) exploits these facts to compute upper and lower bounds on the error }x‹ ´ xC k} and an upper bound on }y‹ ´ ykC }. Although it is not apparent in the above derivation, the equivalence with CG applied to (2) shows that }ykC } is monotonically increasing and }y‹ ´ ykC } is monotonically decreasing (Hestenes and Stiefel, 1952, Theorem 6:3). Unfortunately, the fact that ykC is not updated along orthogonal directions makes it more difficult to monitor }y‹ ´ ykC } and to develop upper and lower bounds. Arioli (2013) suggests the upper bound }y‹ ´ ykC } ď }x‹ ´ xC k }{σn when A has full row rank. LNLQ provides an alternative upper bound on }y‹ ´ ykC } that may be tighter. The residual for CRAIG is

114

(10)

115 116

Other results may be found scattered in the literature. For completeness, we gather them here and provide proofs.

103 104 105 106 107 108 109 110 111 112

}xC k}

rkC :“ b ´ AxC k “ β1 u1 ´ AVk tk “ Uk`1 pβ1 e1 ´ Bk tk q “ ´βk`1 τk uk`1 .

Proposition 1. Let x‹ be the solution of (1) and y‹ the associated Lagrange multiplier with minimum norm, i.e., the minimum-norm solution of (2). The kth C CRAIG iterates xC k and yk solve (11)

minimize }x ´ x‹ } subject to x P RangepVk q,

(12)

minimize }y ´ y‹ }AAT subject to y P RangepUk q

x y

C respectively. In addition, xC k and yk solve

(13)

minimize }x} subject to x P RangepVk q, b ´ Ax K RangepUk q.

(14)

minimize }y}AAT subject to y P RangepUk q, b ´ AATy K RangepUk q.

x y

When A is row-rank-deficient, the pAAT q-norm should be interpreted as a norm when restricted to RangepAq. 117 118

Proof. Assume temporarily that A has full row rank, so that AAT is symmetric positive definite. Then there exists a unique y‹ such that x‹ “ ATy‹ and

119

T C C }xC k ´ x‹ } “ }A pyk ´ y‹ q} “ }yk ´ y‹ }AAT .

In words, the Euclidean norm of the error in x is the energy norm of the error in y. Theorem 6:1 of Hestenes and Stiefel (1952) ensures that ykC is chosen to minimize the C 122 energy norm of the error over all y P RangepUk q, i.e., yk solves (12). 120 121


LNLQ

[toc] 123 124 125 126 127 128 129 130

5

To y P RangepUk q, there corresponds x “ ATy P RangepATUk q “ RangepVk LTk q “ RangepVk q by (4) because Lk is nonsingular. Consequently, CRAIG generates xC k as a solution of (11). When A is rank-deficient, our assumption that Ax “ b is consistent ensures that AATy “ b is also consistent because if there exists a subpace of solutions x, it is possible to pick the one that solves (2), and therefore b P RangepAAT q. Kammerer and Nashed (1972) show that in the consistent singular case, CG converges to the minimum-norm solution, i.e., to y‹ , the solution of minimize }y}

131

y

subject to AATy “ b.

133 134

Let r ă minpm, nq be such that σr ą 0 and σr`1 “ ¨ ¨ ¨ “ σminpm,nq “ 0. Then rankpAq “ r “ dim RangepAq and the smallest nonzero eigenvalue of AAT is σr2 . The Rayleigh-Ritz theorem states that

135

σr2 “ min t}ATw}2 | w P RangepAq, }w} “ 1u.

132

137

By (4), each uk P RangepAq, and (7) and (9) imply that UkT AATUk “ Tk in exact arithmetic. Thus for any t P Rk such that }t} “ 1, we have }Uk t} “ 1 and

138

tT UkT AATUk t “ tT Tk t ě σr2 ,

136

139 140 141 142 143 144 145 146 147 148 149

so that the Tk are uniformly positive definite and CG iterations occur as if CG were applied to the positive-definite reduced system PrT AATPr y˜ “ PrT b, where Pr is the m ˆ r matrix of orthogonal eigenvectors of AAT corresponding to nonzero eigenvalues. Thus in the rank-deficient case, ykC also solves (12) except that the energy “norm” is only a norm when restricted to RangepAq, and xC k also solves (11). To establish (13), note that (5) and (10) imply that xC k is primal feasible for (13). Dual feasibility requires that there exist vectors x ¯, y¯ and z¯ such that x “ ¯. The first two conditions are equivalent to z¯ ` ATUk y¯, VkT z¯ “ 0 and x “ Vk x T VkT x “ 0 ` VkT ATUk y¯ “ BkT Uk`1 Uk y¯ “ LTk y¯. Because x “ Vk x ¯, this amounts to C T ¯ :“ x ¯k , y¯ :“ y¯kC and z¯ :“ 0. The x ¯ “ Lk y¯. Thus dual feasibility is satisfied with x proof of (14) is similar.

3. LNLQ. We define LNLQ as equivalent in exact arithmetic to SYMMLQ (Paige and Saunders, 1975) applied to (2). Whereas SYMMLQ is based on the Lanczos (1950) L 152 process, LNLQ is based on Algorithm 1. Again we seek an approximation yk “ Uk y ¯kL . L 153 The kth iteration of SYMMLQ applied to (2) computes y ¯k as the solution of 150 151

T subject to Hk´1 y¯ “ β1 e1 ,

(15)

155

T where Hk´1 is the top pk ´ 1q ˆ k submatrix of Tk (9).

156 157 158

159

minimize

1 y }2 2 }¯

154

y¯

3.1. An LQ factorization. In SYMMLQ, the computation of y¯kL follows from T the LQ factorization of Hk´1 , which can be derived implicitly via the LQ factorization T of Tk “ Lk Lk . As Lk is already lower triangular, we only need the factorization » fi ε1  —η2 ε2 ffi „ M — ffi T k´1 Ď Ď (16) Lk “ Mk Qk , Mk :“ — , ffi “ .. .. ηk eTk´1 ε¯k – fl . . ηk ε¯k Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

6


[toc]

where QTk “ Q1,2 Q2,3 . . . Qk´1,k is orthogonal and defined as a product of reflections, where Qj´1,j is the identity except for elements at the intersection of rows and columns j ´ 1 and j. Initially, ε¯1 “ α1 and Q1 “ I. Subsequent factorization steps may be 163 represented as

160 161 162

„ j´1

164

j

j´2

j´1

j

ηj´1

ε¯j´1

βj αj

j´2

 »

j´1

j

fi

1 cj sj

–

sj fl ćj

„ “

j´2

j´1

ηj´1

εj´1 ηj

j

 ε¯j

,

where the border indices indicate row and column numbers, with the understanding that ηj´1 is absent when j “ 2. For j ě 2, Qj´1,j is defined by b εj´1 “ ε¯2j´1 ` βj2 , cj “ ε¯j´1 {εj´1 , sj “ βj {εj´1 , 167 165 166

168

and the application of Qj´1,j results in

169

(17)

ηj “ αj sj ,

ε¯j “ ´αj cj . ‰ “ ‰ αk´1 βk ek´1 “ Lk´1 LTk´1 βk ek´1 . From (16),  “ T ‰ “ ‰ Qk ñ Lk´1 βk ek´1 “ Mk´1 0 Qk . ε¯k

“ T We may write Hk´1 “ Lk´1 LTk´1 „ T  „ Mk´1 Lk´1 βk ek´1 T “ 171 Lk “ αk ηk eTk´1 170

172 173 174 175 176 177 178 179 180

181 182 183

Finally, we obtain the LQ factorization “ T (18) Hk´1 “ Lk´1 Mk´1

‰ 0 Qk .

3.2. Definition and update of the LNLQ and CRAIG iterates. In order T to solve Hk´1 y¯kL “ β1 e1 using (18), we already have Lk´1 tk´1 “ β1 e1 , with the next iteration giving τk “ ´βk τk´1 {αk . Next, we consider Mk´1 zk´1 “ tk´1 and find the components of zk´1 “ pζ1 , . . . , ζk´1 q recursively as ζ1 “ τ1 {ε1 , ζj “ pτj ´ ηj ζj´1 q{εj pj ě 2q. This time, the next iteration yields ζ¯k “ pτk ´ ηk ζk´1 q{¯ εk and ζk “ ζ¯k ε¯k {εk “ ck`1 ζ¯k . Thus, „  „  L T zk´1 C T zk´1 (19) y¯k “ Qk and y¯k “ Qk “ QTk z¯k 0 ζ¯k solve (15) and Tk y¯kC “ β1 e1 respectively, matching the definition of the CRAIG iterate. By construction, ykL “ Uk y¯kL and ykC “ Uk y¯kC . We define the orthogonal matrix “ ‰ “ ‰ Ďk “ Uk QTk “ w1 ¨ ¨ ¨ wk´1 w ¯k “ Wk´1 w ¯k , w W ¯1 :“ u1 ,

186 187

so that (19) with zk´1 and z¯k :“ pzk´1 , ζ¯k q yields the orthogonal updates „  L Ďk zk´1 “ Wk´1 zk´1 “ yk´1 (20) ` ζk´1 wk´1 , ykL “ W 0 Ďk z¯k “ Wk´1 zk´1 ` ζ¯k w (21) ykC “ W ¯k “ ykL ` ζ¯k w ¯k .

188

Ďk is orthogonal, we have Because W

189

(22)

184 185

}ykL }2 “ }zk´1 }2 “

k´1 ÿ

ζj2

and }ykC }2 “ }ykL }2 ` ζ¯k2 .

j“1 Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

LNLQ

[toc] 190 191 192 193 194 195

196 197

7

Thus }ykC } ě }ykL }, }ykL } is monotonically increasing, }y‹ ´ ykL } is monotonically decreasing, and }y‹ ´ ykL } ě }y‹ ´ ykC }, consistent with (Estrin et al., 2016, Theorem 6). Contrary to the update of ykC in CRAIG, ykL is updated along orthogonal directions and ykC is found as an orthogonal update of ykL . The latter follows from the transfer procedure of SYMMLQ to the CG point described by Paige and Saunders (1975). At the next iteration, k k`1 « ff “ ‰ “ ‰ ck`1 sk`1 wk w ¯k`1 “ w ¯k uk`1 sk`1 ćk`1 ñ wk “ ck`1 w ¯k ` sk`1 uk`1 ,

198 199

w ¯k`1 “ sk`1 w ¯k ´ ck`1 uk`1 .

200

3.3. Residual estimates. We define the residual rk :“ b ´ Axk “ b ´ AATUk y¯k “ Uk`1 pβ1 e1 ´ Hk y¯k q

201 202

203

204

205 206

using line 1 of Algorithm 1 and (7), where y¯k is either y¯kL or y¯kC . Then for k ą 1,  „ L T L T zk´1 Ď Tk y¯k “ Lk Lk y¯k “ Lk Mk Qk Qk 0 „ „ „  Lk´1 Mk´1 zk´1 “ 0 βk eTk´1 αk ηk eTk´1 ε¯k  „ „  „ Lk´1 β1 e 1 tk´1 “ “ , βk τk´1 ` αk ηk ζk´1 βk eTk´1 αk ηk ζk´1

where we use (16), the definition of tk´1 and zk´1 , and (19). Note also that the identity Qk ek “ sk ek´1 ´ ck ek yields „  z 209 eTk y¯kL “ eTk QTk k´1 “ sk ζk´1 . 0

207

208

The above and (8) combine to give

210

»

ˆ„ rkL “ Uk`1

211



β1 e 1 ´ 0

„

 ˙ Lk LTk y¯kL β¯k`1 eTk

fi 0 “ ´ Uk`1 –βk τk´1 ` αk ηk ζk´1 fl β¯k`1 sk ζk´1

“ ´ pβk τk´1 ` αk ηk ζk´1 quk ´ β¯k`1 sk ζk´1 uk`1 .

212 213

(23)

214

By orthogonality, the residual norm is cheaply computable as 2

2

}rkL }2 “ pβk τk´1 ` αk ηk ζk´1 q ` pβ¯k`1 sk ζk´1 q .

215

Similarly,

216

 „ „  ˙  Tk 0 β1 e 1 C “ Uk`1 ´ ¯ y¯ “ ´ Uk`1 ¯ QTk z¯k 0 βk`1 eTk k βk`1 eTk „ „  0 zk´1 ¯ “ ´ βk`1 Uk`1 ζ¯k sk eTk´1 ´ ck eTk “ ´β¯k`1 psk ζk´1 ´ ck ζ¯k quk`1 , ˆ„

rkC

217

218 219 220

(24)


8


[toc]

221 222 223

where we use Tk y¯kC “ β1 e1 (by definition) and (19). Orthogonality of the uj yields orthogonality of the CRAIG residuals, a property of CG (Hestenes and Stiefel, 1952, Theorem 5:1). The CRAIG residual norm is simply

224

}rkC } “ β¯k`1 |sk ζk´1 ´ ck ζ¯k |.

225 226 227

228

229

In the next section, alternative expressions of }rkL } and }rkC } emerge. 3.4. Updating x “ ATy. The definition yk “ Uk y¯k and (4) yield xk “ ATyk “ A Uk y¯k “ Vk LTk y¯k . The LQ and CRAIG iterates may then be updated as „  zk´1 T L T xL “ V L y ¯ “ V L Q k k k k k k k 0 „  „ „  zk´1 Ďk zk´1 “ Vk Mk´1 “ Vk M 0 0 ηk eTk´1 ε¯k T

“ Vk´1 Mk´1 zk´1 ` ηk ζk´1 vk

230

“ Vk´1 tk´1 ` ηk ζk´1 vk ,

231 232

(25)

233

and similarly,

234

(26)

235

Because Vk is orthogonal, we have

236

(27)

„ xC k “ Vk

2 }xL k} “

k´1 ÿ

Mk´1 ηk eTk´1

τj2 ` pηk ζk´1 q2

„  zk´1 “ xL ¯k ζ¯k vk . k `ε ζ¯k ε¯k

2 and }xC k} “

j“1

k´1 ÿ

τj2 ` pηk ζk´1 ` ε¯k ζ¯k q2 .

j“1

237 238 239

C Both xL k and xk may be found conveniently if we maintain the delayed iterate x ˜k´1 :“ τ1 v1 ` ¨ ¨ ¨ ` τk´1 vk´1 “ x ˜k´2 ` τk´1 vk´1 , for then we have the orthogonal updates

240

(28)

xL ˜k´1 ` ηk ζk´1 vk k “x

and xC ˜k´1 ` pηk ζk´1 ` ε¯k ζ¯k qvk . k “x

Proposition 2. We have ε¯1 ζ¯1 “ τ1 and for k ą 1, ηk ζk´1 ` ε¯k ζ¯k “ τk . Therefore, k ÿ xC “ τk vk and rkC “ ´βk`1 τk uk`1 , k j“1 C which are the expressions for xC k and rk in standard CRAIG.

241 242 243 244 245 246 247 248

Proof. The identity for k “ 1 follows from the definitions of ε¯1 , ζ¯1 , and τ1 . By definition of ζ¯k , we have ε¯k ζ¯k “ τk ´ ηk ζk´1 , i.e., ηk ζk´1 ` ε¯k ζ¯k “ τk . The expressions C ¯ for xC k and rk follow from (28) and from (24), the definition of βk`1 , and (17). C The expressions for xC k and rk in Proposition 2 coincide with those in standard CRAIG. In particular, we recover the property that xC k is updated along orthogonal directions, so that }xC } is monotonically increasing and }x‹ ´ xC k k } is monotonically decreasing, as stated by Paige (1974). Finally, (25) and Proposition 2 give xL k “ C xk´1 ` ηk ζk´1 vk . Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

LNLQ

[toc] 249 250

Proposition 2 allows us to write τk ´ ηk ζk´1 “ ¯k ζ¯k . Because βk τk´1 “ ´αk τk , the LQ residual may be rewritten rkL “ αk pτk ´ ηk ζk´1 quk ´ β¯k`1 sk ζk´1 uk`1 “ αk ¯k ζ¯k uk ´ αk βk`1 sk ζk´1 uk`1 ,

251 252 253 254 255

9

and correspondingly, }rkL }2 “ αk2 pp¯ k ζ¯k q2 ` pβk`1 sk ζk´1 q2 q. We are now able to establish a result that parallels Proposition 1. Proposition 3. Let x‹ be the solution to (1) and y‹ the associated Lagrange multiplier with minimum norm, i.e., the minimum-norm solution of (2). The kth LNLQ iterates ykL and xL k solve (29)

minimize }y ´ y‹ } subject to y P RangepAATUk´1 q

(30)

minimize }x ´ x‹ }pAAT q: subject to x P RangepVk´1 q,

y

x

respectively. In addition, ykL and xL k solve

256 257 258 259 260 261 262 263 264 265 266

(31)

minimize }y} subject to y P RangepUk q, b ´ AATy K RangepUk´1 q,

(32)

minimize }x}pAAT q: subject to x P RangepVk q, b ´ Ax K RangepUk´1 q.

y

x

Proof. By definition, y¯kL solves (15). Hence there must exist t¯ such that y¯kL “ T y¯kL “ β1 e1 . By definition of Hk´1 and (4), we have ykL “ Uk y¯kL “ Hk´1 t¯ and Hk´1 T ¯ Uk Bk´1 Lk´1 t “ AVk´1 LTk´1 t¯ “ AATUk´1 t¯. The above implies that ykL is primal feasible for (29). Dual feasibility requires that T T rkL “ 0 because AATy‹ “ b. The Uk´1 AATpy ´ y‹ q “ 0, which is equivalent to Uk´1 expression (23) confirms that dual feasibility is satisfied. With ykL P RangepAq, we have ykL “ pA: qT xL k and then (30) follows from (29). Using (23), we see that ykL is primal feasible for (31). Dual feasibility requires that ykL “ p ` AATUk´1 q and UkT p “ 0 for certain vectors p and q, but those conditions are satisfied for p :“ 0 and q :“ t¯. Since ykL “ pA: qT xL k , we obtain (32) from (31). Note the subtle difference between the constraints of (13) and (32). C L C Corollary 1. For each k, }xL k } ď }xk } and }xk ´ x‹ } ď }xk ´ x‹ }.

267 268 269 270 271 272

Proof. By (4), RangepVk q “ RangepATUk q because Lk is nonsingular. Thus the constraints of (32) amount to b ´ AATUk y¯ P RangepUk´1 qK , for y¯ such that x “ ATUk y¯. Because dim RangepUk´1 qK decreases as k increases, the objective }x} increases monotonically. In addition, RangepUk qK Ă RangepUk´1 qK and therefore C C L }xL k } ď }xk }. If we compare (11) with (30), we see that }xk ´ x‹ } ď }xk ´ x‹ } because RangepVk´1 q Ă RangepVk q.

3.5. Complete algorithm. Algorithm 2 summarizes LNLQ. Note that if only the x part of the solution is desired, there is no need to initialize and update the vectors wk , w ¯k , ykL and ykC unless one wants to retrieve x as ATy at the end of the 276 procedure. Similarly, if only the y part of the solution is desired, there is no need L C C 277 to initialize and update the vectors xk and xk . The update for xk`1 in line 18 of L 278 Algorithm 2 can be used even if the user wishes to dispense with updating xk . 273 274 275


10


[toc]

Algorithm 2 LNLQ 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:

β1 u1 “ b, α1 v1 “ ATu1 // begin Golub-Kahan process ε¯1 “ α1 , τ1 “ β1 {α1 , ζ¯1 “ τ1 {¯ ε1 // begin LQ factorization w1 “ 0, w ¯1 “ u1 y1L “ 0, y1C “ ζ¯1 w ¯1 C xL 1 “ 0, x1 “ τ1 v1 for k “ 1, 2, . . . do βk`1 uk`1 “ Avk ´ αk uk // continue Golub-Kahan process αk`1 vk`1 “ ATuk`1 ´ βk`1 vk 1 2 εk “ p¯ ε2k ` βk`1 q2 // continue LQ factorization ck`1 “ ε¯k {εk , sk`1 “ βk`1 {εk ηk`1 “ αk`1 sk`1 , ε¯k`1 “ ´αk`1 ck`1 ζk “ ck`1 ζ¯k , ζ¯k`1 “ pτk`1 ´ ηk`1 ζk q{¯ εk`1 // prepare to update y wk “ ck`1 w ¯k ` sk`1 uk`1 , w ¯k`1 “ sk`1 w ¯k ´ ck`1 uk`1 L “ ykL ` ζk wk // update y yk`1 L C ¯ ¯k`1 yk`1 “ yk`1 ` ζk`1 w C // update x xL k`1 “ xk ` ηk`1 ζk vk`1 τk`1 “ ´βk`1 τk {αk`1 C xC k`1 “ xk ` τk`1 vk`1 end for

4. Regularization. The regularized least-norm problem is

279

280

(33)

minimize n m

xPR , sPR

which is compatible “for Algorithm 1 to Aˆ :“ A pk and lower bidiagonal 283 V

281 282

„

2 1 2 p}x}

` }s}2 q

subject to Ax ` λs “ b,

any‰ λ ‰ 0. Saunders (1995, Result 7) states that applying λI with initial vector b preserves Uk . We find corresponding ˆ Lk by comparing the identities

 „ V AT Uk “ k λI

„ T Lk Uk λI

(34)

285 286

ˆ At the first of which results from (4) and the second from Algorithm 1 applied to A. ˆ k designed to zero out the λI block, resulting in iteration k, we apply reflections Q

287

288 289

„ Vk

„ T „ V Lk “ k Uk λI

 Uk

and

„ T A ˆ Tk , Uk “ Vpk L λI

284

„ T ” L T ˆ ˆ Qk Qk k “ Vpk λI

Yˆk

ı „ ˆT  Lk ˆ Tk . “ Vpk L 0

ˆ k to describe CRAIG with regularization under the name Saunders (1995) uses Q extended CRAIG. If we initialize λ1 :“ λ, the reflections composing the first few Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

LNLQ

[toc] 290

291

292

293 294 295 296 297

reflections may be illustrated as » α1 λ1 — β2 α2 λ — – β3 α3 λ β4 α4 » α ˆ1 0 — βˆ α 0 λ2 2 2 Ñ— – β3 α3 λ β4 α4 » α ˆ1 0 — βˆ2 α ˆ 0 0 2 Ñ— – ˆ 0 λ3 β3 α3 β4 α4

»

α ˆ1 ffi — βˆ ffi Ñ — 2 fl –

α2 β3

α3 β4

λ fi

α ˆ2 βˆ3

λ

α3 β4

fi 0 ˆ3 λ

α ˆ3 βˆ4

λ

λ λ

fi

0 0

α ˆ2 βˆ3

ffi ffi fl

α4

»

α ˆ1 ffi — βˆ2 ffi Ñ — fl –

ffi ffi fl

α4 0 0

λ fi

λ λ

»

α ˆ1 ffi — βˆ2 ffi Ñ — fl –

fi

0 ˆ2 λ

0 0

α4

ffi ffi , fl

0 ˆ λ4

λ

where shaded elements are those participating in the current reflection and grayed out elements have not yet been used. Two reflections per iteration are necessary, and the situation at iteration k may be described as k

2k

2k`1

« 298

fi

11

k k`1

k

2k

ff « αk βk`1

λk λ

2k

2k`1

ff «

k

ff

cˆk

sˆk

c˜k

s˜k

sˆk

´ˆ ck

s˜k

´˜ ck

2k

2k`1

« α ˆk βˆk`1

“

0 ˆ k`1 λ

k

2k

α ˆk βˆk`1

0

λ

“

300

The first reflection is defined by b 302 α ˆ k :“ αk2 ` λ2k ,

2k`1

c˜k

s˜k

s˜k

´˜ ck

ff

2k`1

« 299

2k

ff «

ff 0

. λk`1

301

cˆk :“ αk {ˆ αk ,

sˆk :“ λk {ˆ αk ,

304

ˆ k`1 “ sˆk βk`1 . The second reflection defines and results in βˆk`1 “ cˆk βk`1 and λ b ˆ 2k`1 ` λ2 , c˜k :“ λ ˆ k`1 {λk`1 , s˜k :“ λ{λk`1 , λk`1 :“ λ

305

and does not create a new nonzero. Only the first reflection contributes to Vpk :

303

„ 306

(35)

k

2k

vk 0

0 uk

 „

k

2k

cˆk sˆk

sˆk ´ˆ ck

k



„ “

cˆk vk sˆk uk

2k

 sˆk vk , ´ˆ ck u k

where column k is vpk . T Iteration k of LNLQ with regularization solves (15) but Hk´1 is then the top pk ´ 1q ˆ k submatrix of „  “ ‰ LTk Lk λI 310 “ Lk LTk ` λ2 I “ Tk ` λ2 I. λI

307 308 309

311 312

ˆ Tk instead of LTk , but the details are In (16), we compute the LQ factorization of L L identical, as are the updates of yk in (20) and ykC in (21). Because Uk is unchanged Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

12 313 314


[toc]

by regularization, the residual expressions (23) and (24) remain valid. Subsequently, „ L „ T  xk A ˆ Tk y¯k , Uk y¯k “ Vpk L L “ λI sk

316

p but we are only interested in the top half of xL k . Let the top n ˆ k submatrix of Vk be “ ‰ “ ‰ “ ‰ T xk :“ w ˆk . p1 ¨ ¨ ¨ w pk “ I 0 Vpk “ Vk 0 Q W

317 318

pj “ cˆj vj for j “ 1, . . . , k. The update (26) remains valid We conclude from (35) that w pk . with vk replaced by w

315

319

5. Error upper bounds.

320

5.1. Upper bound on }y‹ ´ ykL }. By orthogonality, }y‹ ´ ykL }2 “ }y‹ }2 ´ }ykL }2 .

321

´1

If A has full column rank, y‹ “ pAAT q

f pAAT q :“

322

´2

b and }y‹ }2 “ bT pAAT q m ÿ

b. If we define

f pσi2 qqi qiT

i“1

323 324 325

for any given f : p0, 8q Ñ R, where qi is the ith left singular vector of A, then }y‹ }2 “ bT f pAAT qb with f pξq :“ ξ ´2 . More generally, as y‹ is the minimum-norm solution of (2), it may be expressed as y‹ “

326

m ÿ

f pσi q pqiT bq qi ,

i“r

327 328

329

where σr is the smallest nonzero singular value of A, which amounts to redefining f pξq :“ 0 at ξ “ 0. Because b “ β1 u1 , we may write }y‹ }2 “ β12

m ÿ

f pσi qµ2i ,

µi :“ qiT u1 , i “ 1, . . . , m.

i“1

We obtain an upper bound on }y‹ } by viewing the sum above as a Riemann-Stieltjes integral for a well-chosen Stieltjes measure and approximating the integral via a GaussRadau quadrature. We do not repeat the details here and refer the reader to Golub and Meurant (1997) for background. The fixed Gauss-Radau quadrature node is set to a prescribed σest P p0, σr q. We follow Estrin et al. (2017) and modify Lk rather than Tk . Let „  Lk´1 0 r 336 (36) Lk :“ . βk eTk´1 ωk

330

331 332 333 334 335

337 338

r k differs from Lk in its pk, kqth element only, and Note that L „  Tk´1 β¯k´1 ek´1 T r r r Tk :“ Lk Lk “ ¯ βk´1 eTk´1 βk2 ` ωk2

(with β¯k´1 defined in (9)) also differs from Tk in its pk, kqth element only. The Poincaré separation theorem ensures that the singular values of Lk lie in pσr , σ1 q. The Cauchy interlace theorem for singular values ensures that it is possible to select ωk so that the smallest singular value of (36) is σest . 343 The next result derives from (Golub and Meurant, 1997, Theorems 6.4 and 12.6). 339 340 341 342


LNLQ

[toc]

13

Theorem 1 (Estrin et al., 2017, Theorem 4). Let f : r0, 8q Ñ R be such that f p2j`1q pξq ă 0 for all ξ P pσr2 , σ12 q and all j ě 0. Fix σest P p0, σr q. Let Lk be the bidiagonal generated after k steps of Algorithm 1 and ωk ą 0 be chosen so that the smallest singular value of (36) is σest . Then, rk L r Tk qe1 . bT f pAAT qb ď β12 eT1 f pL 344 The procedure for identifying ωk is identical to that of Estrin et al. (2017) and b 2 345 yields ωk “ σest ´ σest βk θ2k´2 , where θ2k´2 is an element of a related eigenvector. 346 347

Application of Theorem 1 to f pξq :“ ξ ´2 with the convention that f p0q :“ 0 provides an upper bound on }y‹ }2 . Corollary 2. Fix σest P p0, σr q. Let Lk be the bidiagonal generated after k steps of Algorithm 1 and ωk ą 0 be chosen so that the smallest singular value of (36) is σest . Then, rk L r Tk q´2 e1 . }y‹ }2 ď β12 eT1 pL

348 349

In order to evaluate the upper bound stated in Corollary 2, we modify the LQ factorization (16) to „ T r Tk “ Lk´1 L 0

350

 „ Mk´1 βk ek´1 “ ηrk eTk´1 ωk

„ Qk´1 εrk

 1

351

where ηrk “ ωk sk and εrk “ ´ωk ck . Define r tk and zrk such that

352

(37)

353

The updated factorization and the definition of f yield

rk r L t k “ β1 e 1

´1

rk M Ăk Qk q }y‹ }2 ď β12 }pL

354

Ăk Qk , “M

Ăk zrk “ r and M tk .

2 Ăk´1 L r ´1 Ă´1 r 2 e1 }2 “ β12 }M zk }2 . k e1 } “ }Mk tk } “ }r

tk “ ptk´1 , τrk q Comparing with the definition of tk and zk in subsection 3.2 reveals that r rk “ pzk´1 , ζrk q with τrk “ ´βk τk´1 {ωk and ζrk “ pr 356 and z τk ´ ηrk ζk´1 q{r εk . Combining 357 with (22) yields the bound 355

358

(38)

}y‹ ´ ykL }2 “ }y‹ }2 ´ }zk´1 }2 ď }zk´1 }2 ` ζrk2 ´ }zk´1 }2 “ ζrk2 .

5.2. Upper bound on }y‹ ´ ykC }. Estrin et al. (2016, Theorem 6) establish that C L 360 }y‹ ´ yk } ď }y‹ ´ yk } so that the bound from the previous section applies. However, ¯ 361 with ζk is defined in subsection 3.2 they also derive the improved bound 359

362

(39)

}y‹ ´ ykC }2 ď ζrk2 ´ ζ¯k2 .

Estrin et al. (2016) provide further refinement over this bound by using the sliding window approach. Opdq scalars can be stored at each iteration, and for Opdq additional pdq 365 work a quantity θk can be computed so that 363 364

pdq }y‹ ´ ykC }2 ď ζrk2 ´ ζ¯k2 ´ 2θk .

366

(40)

367

Note that the definitions of ck , sk , ζk , and ζ¯k match those in (Estrin et al., 2016). Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

14 368 369


[toc]

5.3. Upper bound on }x‹ ´ xC k }. Assume temporarily that A has full column 2 2 C 2 rank. By orthogonality in (25), }x‹ ´ xC k } “ }x‹ } ´ }xk } . We may then use

370

}x‹ }2 “ }ATy‹ }2 “ }y‹ }2AAT “ }b}2pAAT q´1 .

371 372

Applying Theorem 1 to f pξq :“ ξ ´1 redefined such that f p0q :“ 0 provides an upper bound on }x‹ }2 in the vein of (Golub and Meurant, 1997, Theorems 6.4 and 12.1). Corollary 3. Fix σest P p0, σr q. Let Lk be the bidiagonal generated after k steps of Algorithm 1 and ωk ą 0 be chosen so that the smallest singular value of (36) is σest . Then, rk L r Tk q´1 e1 . }x‹ }2 ď β12 eT1 pL We use (37) to evaluate the bound given by Corollary 3 as

373

´1

rk L r Tk q β12 eT1 pL

374

2 r ´1 r 2 e1 “ }β1 L k e1 } “ }tk } ,

375

which leads to the bound

376

(41)

377 378 379

This bound must coincide with that of Arioli (2013), which he derived using the Cholesky factorization of Tk . Note that Arioli (2013, Equation p4.4q) proposes the error bound

380

(42)

381 382

It may be possible to improve on (42) by maintaining a running estimate of σmin pLk q, such as the estimate minpε1 , . . . , εk´1 , ε¯k q discussed by Stewart (1999).

2 2 r 2 }x‹ ´ xC rk2 ´ τk2 . k } ď }tk } ´ }tk } “ τ

C ´1 ´1 C }y‹ ´ ykC } “ }L´1 }x‹ ´ xC n px‹ ´ xk q} ď σmin pLk q k } ď σr }x‹ ´ xk }.

L C 5.4. Upper bound on }x‹ ´ xL k }. Using xk “ xk´1 ` ηk ζk´1 vk , we have › ˆ ˙›2 „ tk´1 › › 2 2 2 › › “ }x‹ ´ xC η ζ }x‹ ´ xL } “ V t ´ k k´1 k k } ` pτk ´ ηk ζk´1 q . › n n ›

383

384

0

385

Thus, using the error bound in (41) we obtain

386

(43)

387 388

6. Preconditioning. As with other Golub-Kahan-based methods, convergence depends on the distribution of tσi pAqu. Therefore we consider an equivalent system 1 1 1 1 1 N ´ 2 AATN ´ 2 N 2 y “ N ´ 2 b, where N ´ 2 A has clustered singular values. For the unregularized problem (2), to run preconditioned LNLQ efficiently we replace Algorithm 1 by the Generalized Golub-Kahan process (Arioli, 2013, Algorithm 3.1). We seek a preconditioner N ą 0 such that N « AAT, and require no changes to the algorithm except in how we generate vectors uk and vk . This is equivalent to applying a block-diagonal preconditioner to the saddle-point system „ „ „  „ „  I I 0 Í AT x “ . ´1 ´1 y b N N A 0

389 390 391 392 393 394 395 396 397

2 }x‹ ´ xL rk2 ´ τk2 ` pτk ´ ηk ζk´1 q2 . k} ď τ

For a regularized system with λ ‰ 0, we need to solve a 2ˆ2 quasi-definite system „ „  „  0 Í AT x (44) “ . b A λ2 I y Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

LNLQ

[toc]

15

True LNLQ in x LNLQ Upper bound in x

True CRAIG in x CRAIG Upper bound in x

0

10 0

10 -5

10 -5

10 -10

10 -10

10

10 -15

0

20

40

60

80

100

120

10 -15

0

20

40

60

80

100

120

60

80

100

120

True LNLQ in y LNLQ Upper bound in y

10 0

10 0

10 -5

10 -5

10 -10

10 -10

0

20

40

60

80

100

120

True CRAIG in y CRAIG Upper bound in y CRAIG Upper bound in y, d=5 CRAIG Upper bound in y, d=10 CRAIG Arioli bound in y

0

20

40

Fig. 1. Error in x (top) and y (bottom) along the LNLQ (left) and CRAIG iterations (right). The blue line is the exact error. The red line represents error bounds using quadrature, the green line is the error bound (42) from Arioli (2013), while the yellow and magenta lines employ the sliding window improvement (40) with d “ 5 and 10.

We cannot directly precondition with Generalized Golub-Kahan as before, because properties analogous to (34) do not hold for N ‰ I. Instead we must precondition the equivalent 3ˆ3 block system » fi » fi » fi » fi » fi I I x 0 Í AT – fl – fl –0fl , I I 401 Í λI fl – s fl “ – y b N ´1 N ´1 A λI

398 399 400

402 403 404 405 406 407

where N « AAT ` λ2 I is a symmetric positive definite preconditioner. In effect, we “ ‰ must run preconditioned LNLQ directly on Aˆ “ A λI . 7. Implementation and numerical experiments. We implemented LNLQ in Matlab1 , including the relevant error bounds. The exact solution for each experiment is computed using Matlab’s backslash operator on the augmented system (1). Mentions of CRAIG below refer to the transfer from the LNLQ point to the CRAIG point.

7.1. UFL problem. Matrix Meszaros/scagr7-2c ? from the UFL collection (Davis and Hu, 2011) has size 2447 ˆ 3479. We set b “ e{ m, the normalized vector of ones. For LNLQ and CRAIG we record the error in both x and y at each iteration using the exact solution, and the error bounds discussed above using σest “ p1 ´ 10´10 q σmin pAq, where σmin pAq was provided from the UFL collection. The same σest is used to evaluate 413 the bound (42). Figure 1 records the results. 414 We see that the LNLQ error bounds are tight, even though the error in x is not 415 monotonic. In accordance with Proposition 1, the CRAIG error is lower than the 408 409 410 411 412

1

Available from github.com/restrin/LinearSystemSolvers Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

16 416 417 418 419 420 421 422 423 424 425 426


[toc]

LNLQ error in x, but it is also the case in y. The CRAIG error in x is tight until the Gauss-Radau quadrature becomes inaccurate, a phenomenon also observed by Meurant and Tichý (2014, 2015). Regarding the CRAIG error in y, we see that the error bounds from (39) and (42) are close to each other, with (42) being slightly tighter. We observed that the simpler bound (42) nearly overlaps with the bound (39) on other problems. However, (40) provides the ability to tighten (39), and even small window sizes such as d “ 5 or 10 can improve the bound significantly until the Gauss-Radau quadrature becomes inaccurate. Thus, the sliding window approach can be useful when an accurate estimate of σmin pAq is available if early termination is relevant, for example when only a crude approximation to x and y is required.

427 428 429

7.2. Fletcher’s penalty function. We now apply LNLQ to least-norm problems arising from using Fletcher’s exact penalty function (Fletcher, 1973; Estrin et al., 2018) to solve PDE-constrained control problems. We consider the problem ż ż 2 1 1 minimize 2 }u ´ ud } dx ` 2 α z2 dx

430

(45)

u, z

Ω

Ω

subject to ∇ ¨ pz∇uq “ ´ sinpωx1 q sinpωx2 q in Ω, u“0

on BΩ,

431 432 433 434 435

where ω “ π ´ 18 , Ω “ r´1, 1s2 , and α ě 0 is a small regularization parameter. Here, u might represent the temperature distribution on a square metal plate, ud is the observed temperature, and we aim to determine the diffusion coefficients z so that u matches the observations in a least-squares sense. We discretize (45) using finite elements with triangular cells, and obtain the equality-constrained problem

436

minimize f p¯ uq subject to cp¯ uq “ 0. u ¯

2

437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455

2

Let p be the number of cells along one dimension, so that u P Rp and z P Rpp`2q 2 are the discretizations of u and z, ¯ :“ pu, uq P Rp . We use p “ 31 in the “ u ‰ zq, and cp¯ uq. experiments below. Let Ap¯ uq :“ Au Az be the Jacobian of cp¯ For a given penalty parameter σ ą 0, Fletcher’s exact penalty approach is to minimize φσ p¯ uq :“ f p¯ uq ´ cp¯ uqT yσ p¯ uq u ¯ › ›2 › › uqT y. where yσ p¯ uq P arg min 21 ›∇f p¯ uq ´ Ap¯ uqT y › ` σcp¯ y

In order to evaluate φσ p¯ uq and ∇φσ p¯ uq, we must solve systems of the form (2) with b “ ćp¯ uq and A “ Ap¯ uq. Note that by controlling the error in the solution of (2), we control the inexactness in the computation of the penalty function value and gradient. In our experiments, we evaluate b and A at u ¯ “ e, the vector of ones. We first apply LNLQ and CRAIG without preconditioning. The results are summarized in Figure 2. We observe trends like those in the previous section. The LNLQ bounds are quite accurate because of our accurate estimate of the smallest singular value, even though the LNLQ error in x is not monotonic. The CRAIG error in x is quite accurate until the Gauss-Radau quadrature becomes unstable, which results in a looser bound. The latter impacts the CRAIG error bound for y in the form of the plateau after iteration 250. The error bound (42) is slightly tighter than (39), while if we use (40) with d “ 20, we achieve a tighter bound until the plateau occurs. Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

LNLQ

[toc] 10 5

17

10 5 True LNLQ in x LNLQ Upper bound in x


10 0

10 0

10 -5

10 -5

10 -10

10 -10

10 -15

10

0

50

100

150

200

250

300

350

5

10 -15

10

0

50

100

150

200

250

300

350

200

250

300

350

5


10 0

10 0

10 -5

10 -5

10 -10

10 -10

10 -15

0

50

100

150

200

250

300

350

10 -15

True CRAIG in y CRAIG Upper bound in y CRAIG Upper bound in y, d=20 CRAIG Arioli bound in y

0

50

100

150

Fig. 2. Error in x (top) and y (bottom) along the LNLQ iterations (left) and CRAIG iterations (right). The red line represents error bounds using quadrature, the yellow line uses a sliding window of d “ 20, and the green line is (42).

456 457 458 459 460 461 462

We now use the preconditioner N “ Au ATu , which corresponds to two solves of Poisson’s equation with fixed diffusion coefficients. Because σmin ppAu Au q´1 AAT q “ σmin pI ` pAu ATu q´1 Az ATz q ě 1, we choose σest “ 1. Recall that the y-error is now measured in the N -energy norm. The results appear in Figure 3. We see that the preconditioner is effective, and that σest “ 1 is an accurate approximation as the LNLQ error bounds are extremely tight. The CRAIG error bounds are very tight as well.

465

8. Extension to symmetric quasi-definite systems. Given symmetric and positive definite M and N whose inverses can be applied efficiently, LNLQ generalizes to the solution of the symmetric and quasi-definite (Vanderbei, 1995) system

466

(46)

467

which represents the optimality conditions of both

468

(47)

minimize 21 }x}2M ` 12 }y}2N

469 470

(48)

minimize 21 }Ax ´ b}2N ´1 ` 12 }x}2M .

471 472 473 474

The only changes required are to substitute Algorithm 1 for the generalized GolubKahan process (Orban and Arioli, 2017, Algorithm 4.2) and to set the regularization parameter λ :“ 1. The latter requires one system solve with M and one system solve with N per iteration.

463 464

K

x, y

„  „ x M :“ y A

AT Ń

„  „  x 0 “ , y b

subject to Ax ´ N y “ b,

x


18


[toc]

10 5

10 5


True LNLQ in x LNLQ Upper bound in x 0

10 0

10

10 -5

10 -5

10 -10

10 -10

10 -15

10 -15

10 -20

10 -20

0

2

4

6

8

10

12

14

16

18 10

10 5

0

2

4

6

8

10

10 0

10 0

10 -5

10 -5

10 -10

10 -10

10 -15

10 -15

0

2

4

6

8

10

12

14

16

18

14

16

18

True CRAIG in y CRAIG Upper bound in y CRAIG Arioli bound in y


10 -20

12

5

10 -20

0

2

4

6

8

10

12

14

16

18

Fig. 3. Error in x (top) and y (bottom) along the preconditioned LNLQ iterations (left) and CRAIG iterations (right). The red line represents error bounds using quadrature with σest “ 1, and the green line is the error bound from Arioli (2013).

475 476

Applying LSLQ (Estrin et al., 2017) to (48) is implicitly equivalent to applying SYMMLQ to the normal equations

477

(49)

478 479

while applying LNLQ to (47) is equivalent to applying SYMMLQ to the normal equations of the second kind:

480

(50)

481 482

where we changed the sign of y to avoid distracting minus signs. In lieu of (4), the generalized Golub-Kahan process can be summarized as

483

(51a)

484 485

(51b)

pAT N ´1 A ` M qx “ AT N ´1 b,

pAM ´1 AT ` N qy “ c,

M x “ AT y,

AVk “ M Uk`1 Bk , T

A Uk`1 “ N Vk BkT ` αk`1 N vk`1 eTk`1 “ N Vk`1 LTk`1 ,

where this time UkT M Uk “ I and VkT N Vk “ I in exact arithmetic. Pasting (51) together yields „ „  „ „ „  „  Vk M Vk 0 M AT I LTk 488 “ ` eT , Uk N Uk Lk Í βk`1 N uk`1 2k A Ń „ „  „ „ „  „  Vk M Vk αk`1 M vk`1 T M AT I BkT 489 “ ` e2k`1 . Uk`1 N Uk`1 Bk Í 0 A Ń 490

486 487

491 492

These relations correspond to a Lanczos process applied to (46) with preconditioner blkdiagpM, N q. The small SQD matrix on the right-hand side of the previous identities Commit 82ae2b0 by Dominique Orban on 2017-10-27 19:12:49 -0400

LNLQ

[toc] 493 494

19

is a symmetric permutation of the Lanczos tridiagonal, which is found by restoring the order in which the Lanczos vectors pvk , 0q and p0, uk q are generated: » 1 α fi 1

495

496 497 498 499 500 501 502 503 504 505

T2k`1

— — “— –

α1 ´1 β2 β2 1 . . .

.. .. . . αk

αk ´1 βk`1 βk`1 1

ffi „ T ffi 2k ffi “ βk`1 eT2k fl

 βk`1 e2k . 1

Saunders (1995) and Orban and Arioli (2017) show that the CG iterates are welldefined for (46) even though K is indefinite. In a similar vein, Orban and Arioli (2017) establish that applying MINRES to (46) with the block-diagonal preconditioner produces alternating preconditioned LSMR and LSQR iterations, where LSMR is applied to (49) and LSQR is applied to (50). It turns out that SYMMLQ applied directly to (46) with this preconditioner satisfies the following property: even iterations are CG iterations, while odd iterations take a zero step and make no progress. Thus every other iteration is wasted. The generalized iterative methods of Orban and Arioli (2017), LSLQ or LNLQ should be used instead. The property is formalized in the following result. Theorem 2. Let xLQ and xCG be the iterates generated at iteration k of k k SYMMLQ and CG applied to (46), and xC k be the iterate defined in (6). Then LQ CG C for k ě 1, xLQ “ x “ x “ x . 2k k 2k´1 2k

506 507 508

Proof. For brevity, we use the notation from (Estrin et al., 2016, §2.1) to describe the Lanczos process and how to construct the CG and SYMMLQ iterates. By (51), T k and the L factor of the LQ factorization of T Tk´1 have the form fi

» 509

510 511

1 t2 t2 ´1 t3 — t 1 .. . — 3

Tk “ — –

.. .. . .

fi

1

δ γ — ε23 δ32

Lk “ –

..

.

tk p´1q tk`1

γ¯k´1 – δ¯ k 0

fi „ tk k´1 fl ck p´1q sk tk`1

»

sk ćk

γ ¯k´1 γk´1 ,

tk γk´1 .



γk´1 “ – δk εk`1

with γ¯1 “ 1, δ¯2 “ t2 , ck “

514 515

We show that δj “ 0 for all j by showing that γ¯k “ that case

517 518 519

..

..

.

ffi fl ,

.

εk´1 δk´1 γk´1

513

516

γ3

where each ti is a scalar. For k ě 2, the LQ factorization is accomplished using reflections defined by »

512

»γ ffi ffi ffi , tk fl k´1

and sk “

fi 0 γ¯k fl , δ¯k`1

p´1q ck

k

for k ě 2, because in

γ¯ t δk “ δ¯k ck ´ p´1qk´1 sk “ ptk ck´1 q k´1 ´ p´1qk´1 k γk´1 γk´1 ¯ t ´ “ k p´1qk´1 ´ p´1qk´1 “ 0. γk´1 For k “ 2 we have γ22 “ 1 ` t22 and c2 “

1 γ2 ,

so that γ¯2 “ δ¯2 s2 ` c2 “

2

t2 γ2

` γ1 “ γ2 “ 2

1 c2 .


20


[toc]

10 0

10 -5

10 -10

10 -15 0

10

20

30

40

50

60

70

Fig. 4. Error }x‹ ´ xk } generated by SYMMLQ applied to (46). Notice that every odd iteration makes no progress, resulting in a convergence plot resembling a step function. k´1

520 521 522 523 524

Proceeding by induction, assume ck´1 “ p´1q γ ¯k´1 . Then ´ ¯ γ¯k “ δ¯k sk ´ p´1qk´1 ck “ c1 ´tk ck´1 sk ck ´ p´1qk´1 c2k k ´ ¯ “ ´ c1 p´1qk´1 γ¯ tk sk ck ` p´1qk´1 c2k k k´1 ˆ ˙ k k sk 2 “ p´1q s c ` c “ p´1q . k k k ck c k ck

For all k, since δk “ 0 and xLQ “ Wk´1 zk´1 with Wk´1 having orthonormal columns, k and since pzk´1 qj “ ζj is defined by Lk´1 zk´1 “ }b}e1 , we have ζk “ 0 for k even. LQ LQ CG 527 Therefore x2k “ x2k´1 . Furthermore, since ζk “ ck ζ¯k and xk “ xLQ ` ζ¯k w ¯k for k LQ CG CG 528 some w ¯k K Wk , we have ζ2k “ 0 and x2k “ x2k . The identity x2k “ xC follows k 529 from (Saunders, 1995, Result 11). 525 526

530 531 532 533 534

We illustrate Theorem 2 using a small numerical example. We randomly generate A and b with m “ 50, n “ 30, M “ I, and N “ I and run SYMMLQ directly on (46). We compute x‹ via Matlab’s backslash operator, and compute }xk ´ x‹ } at each iteration to produce Figure 4. The resulting convergence plot resembles a staircase because every odd iteration produces a zero step.

535 536 537 538 539

9. Discussion. LNLQ fills a gap in the family of iterative methods for (2) based on the Golub and Kahan (1965) process. Whereas CRAIG is equivalent to CG applied directly to (2), LNLQ is equivalent to SYMMLQ, but is numerically more stable when A is ill-conditioned. The third possibility, MINRES (Paige and Saunders, 1975) applied to (2), is equivalent to LSQR (Paige and Saunders, 1982a,b) because both minimize the residual }Axk ´ b}, where xk P Kk is implicitly defined as AT yk . As in the companion method LSLQ (Estrin et al., 2017), an appropriate GaussRadau quadrature yields an upper bound on }ykL ´ y‹ }, and transition to the CRAIG point provides an upper bound on }ykC ´ y‹ }. However, it is xC k that is updated along orthogonal directions, and not xL . Thus the upper bound on }xL k k ´ x‹ }, which we C developed for completeness, is deduced from that on }xk ´ x‹ }. In our numerical experiments, both error bounds are remarkably tight, but }xL k ´ x‹ } may lag behind }xC ´ x } by several orders of magnitude and is not monotonic. Although the bound k ‹ C on }yk ´ y‹ } suggested by Arioli (2013) is tighter than might have been anticipated,

540 541 542 543 544 545 546 547 548


LNLQ

[toc]

21

Table 1 2 Comparison of CRAIG and LNLQ properties on min }x} subject to Ax “ b.

}xk } }x‹ ´ xk } }yk } }y‹ ´ yk } }r‹ ´ rk } }rk }

CRAIG

LNLQ

Õ (13) and (P, 1974) Œ (11) and (P, 1974) Õ (22) and (HS, 1952) Œ (22) and (HS, 1952) not-monotonic not-monotonic

non-monotonic, ď CRAIG (Corollary 1) non-monotonic, ě CRAIG (Corollary 1) Õ (22) and (PS, 1975), ď CRAIG (EOS, 2016) Œ (22) and (PS, 1975), ě CRAIG (EOS, 2016) not-monotonic not-monotonic

Õ monotonically increasing Œ monotonically decreasing EOS (Estrin et al., 2016), HS (Hestenes and Stiefel, 1952), P (Paige, 1974), PS (Paige and Saunders, 1975)

561 562 563 564

the sliding window strategy allows us to tighten it further at the expense of a few extra scalar operations per iteration. All error upper bounds mentioned above depend on an appropriate Gauss-Radau quadrature, which has been observed to become numerically inaccurate below a certain error level (Meurant and Tichý, 2014, 2015). This inaccuracy causes the loosening of the bounds observed in section 7. Should a more stable computation of the GaussRadau quadrature become available, all error upper bounds would improve, including those from the sliding window approach, which would become tight throughout all iterations. USYMLQ, based on the orthogonal tridiagonalization process of Saunders, Simon, and Yip (1988), coincides with SYMMLQ when applied to consistent symmetric systems. For (2) it also coincides with LNLQ, but it would be wasteful to apply USYMLQ directly to (2). Fong and Saunders (2012, Table 5.1) summarize the monotonicity of various quantities related to LSQR and LSMR iterations. Table 1 is similar but focuses on CRAIG and LNLQ.

565 566

Acknowledgements. We are grateful to Drew Kouri for the Matlab implementation of the PDE-constrained optimization problems used in the numerical experiments.

567 568 569 570 571 572

References. M. Arioli. Generalized Golub-Kahan bidiagonalization and stopping criteria. SIAM J. Matrix Anal. Appl., 34(2):571–592, 2013. DOI: 10.1137/120866543. J. E. Craig. The N-step iteration procedures. J. Math. and Physics, 34(1):64–73, 1955. T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Software, 38(1):1:1–1:25, December 2011. DOI: 10.1145/2049662.2049663. R. Estrin, D. Orban, and M. A. Saunders. Euclidean-norm error bounds for CG via SYMMLQ. Cahier du GERAD G-2016-70, GERAD, Montréal, QC, Canada, 2016. R. Estrin, D. Orban, and M. A. Saunders. LSLQ: An iterative method for linear least-squares with an error minimization property. Cahier du GERAD G-2017-05, GERAD, Montréal, QC, Canada, 2017. To appear in SIAM J. Matrix Anal. R. Estrin, M. P. Friedlander, D. Orban, and M. A. Saunders. Implementing a smooth exact penalty function for nonlinear optimization. Cahier du GERAD G-2018-XX, GERAD, 2018. In preparation. R. Fletcher. A class of methods for nonlinear programming: III. rates of convergence. In F. A. Lootsma, editor, Numerical Methods for Nonlinear Optimization. Academic Press, New York, 1973.

549 550 551 552 553 554 555 556 557 558 559 560

573 574 575 576 577 578 579 580 581 582 583


22 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625


[toc]

D. C.-L. Fong and M. A. Saunders. CG versus MINRES: An empirical comparison. SQU Journal for Science, 17(1):44–62, 2012. G. H. Golub and W. Kahan. Calculating the singular values and pseudo-inverse of a matrix. SIAM J. Numer. Anal., 2(2):205–224, 1965. DOI: 10.1137/0702016. G. H. Golub and G. Meurant. Matrices, moments and quadrature II; how to compute the norm of the error in iterative methods. BIT, 37(3):687–705, 1997. DOI: 10.1007/BF02510247. M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Standards, 49(6):409–436, 1952. W. J. Kammerer and M. Z. Nashed. On the convergence of the conjugate gradient method for singular linear operator equations. SIAM J. Numer. Anal., 9(1):165–181, 1972. DOI: 10.1137/0709016. C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur. Standards, 45:225–280, 1950. G. Meurant and P. Tichý. A new algorithm for computing quadrature-based bounds in conjugate gradients, 2014. URL http://www.cs.cas.cz/tichy/download/present/ 2014Spa.pdf. G. Meurant and P. Tichý. On the numerical behavior of quadrature based bounds for the A-norm of the error in CG, 2015. URL http://www.cs.cas.cz/~tichy/download/ present/2015ALA.pdf. D. Orban and M. Arioli. Iterative Solution of Symmetric Quasi-Definite Linear Systems, volume 3 of Spotlights. SIAM, 2017. DOI: 10.1137/1.9781611974737. URL http://bookstore.siam.org/sl03. C. C. Paige. Bidiagonalization of matrices and solution of linear equations. SIAM J. Numer. Anal., 11(1):197–209, 1974. DOI: 10.1137/0711019. C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal., 12(4):617–629, 1975. DOI: 10.1137/0712047. C. C. Paige and M. A. Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8(1):43–71, 1982a. DOI: 10.1145/355984.355989. C. C. Paige and M. A. Saunders. Algorithm 583; LSQR: Sparse linear equations and least-squares problems. ACM Trans. Math. Software, 8(2):195–209, 1982b. DOI: 10.1145/355993.356000. M. A. Saunders. Solution of sparse rectangular systems using LSQR and CRAIG. BIT Numerical Mathematics, 35:588–604, 1995. DOI: 10.1007/BF01739829. M. A. Saunders, H. D. Simon, and E. L. Yip. Two conjugate-gradient-type methods for unsymmetric linear equations. SIAM J. Numer. Anal., 25(4):927–940, 1988. DOI: 10.1137/0725052. G. W. Stewart. The QLP approximation to the singular value decomposition. SIAM J. Sci. Comput., 20(4):1336–1348, 1999. DOI: 10.1137/S1064827597319519. R. J. Vanderbei. Symmetric quasi-definite matrices. SIAM J. Optim., 5(1):100–113, 1995. DOI: 10.1137/0805005.