Spectral residual method without gradient information for solving large

Spectral residual method without gradient information for solving large-scale nonlinear systems of equations: Theory and experiments William La Cruz∗ Departamento de Electrónica, Computación y Control Universidad Central de Venezuela Caracas, Venezuela. E-mail: [email protected] José Mario Mart´ınez † Department of Applied Mathematics IMECC-UNICAMP University of Campinas CP 6065, 13081-970 Campinas SP, Brazil E-mail: [email protected] Marcos Raydan‡ Departamento de Computación Universidad Central de Venezuela Apartado 47002, Caracas 1041-A, Venezuela E-mail: [email protected] Technical Report RT-04-08, July 2004

Abstract A fully derivative-free spectral residual method for solving large-scale nonlinear systems of equations is presented. It uses in a systematic way the residual vector as a search direction, a spectral step length that produces a nonmonotone process and a globalization strategy that allows this nonmonotone behavior. The global convergence analysis of the combined scheme is presented. An extensive set of numerical experiments that indicate that the new combination is competitive and frequently better than well-known Newton Krylov methods for large-scale problems is also presented. ∗

Supported by Agenda Petr´ oleo UCV-PROJECT 97-003769. Supported by PRONEX-Optimization 76.79.1008-00, FAPESP (Grant 90-3724-6). ‡ Supported by Agenda Petr´ oleo UCV-PROJECT 97-003769. †

1

1

Introduction

We introduce a derivative-free nonmonotone iterative method for solving the nonlinear system of equations F (x) = 0, (1) where F : IRn → IRn is a continuously differentiable mapping. We are interested in largescale systems for which the Jacobian of F is not available or requires a prohibitive amount of storage. Recently, La Cruz and Raydan [22] introduced the Spectral Algorithm for Nonlinear Equations (SANE) for solving (1). SANE uses in a systematic way the residual ±F (xk ) as search direction. The first trial point at each iteration is xk − σk F (xk ) where σk is a spectral coefficient. Global convergence is guaranteed by means of a variation of the nonmonotone strategy of Grippo, Lampariello and Lucidi [19]. This approach requires descent directions with respect to the squared norm of the residual. As a consequence, the computation of a directional derivative, or a very good approximation of it, is necessary at every iteration. The spectral coefficient is an appropriate Rayleigh quotient with respect to a secant approximation of the Jacobian. Spectral gradient methods for minimization were originated in the Barzilai-Borwein paper [2]. The properties of their method for general quadratic functions were elucidated in [30]. Further analysis of spectral gradient methods can be found in [10, 14, 31] among others. For a review containing the more recent advances on spectral choices of the steplength for minimization problems, see [15]. In this paper we introduce a new nonmonotone line-search technique that can be associated to the same search directions and the same initial steplengths as the SANE algorithm. In other words, the first trial point at each iteration will be the same as in SANE, but the line-search strategy is different. The main consequence is that, in the new approach, directional derivatives are not required at all. We also present an extensive set of numerical experiments that indicate that the new method is competitive and sometimes better than the SANE algorithm. We recall that, in [22], SANE was in turn compared favorably with several Newton-Krylov methods (see, e.g., [3, 8, 9]). Therefore, the new algorithm represents an encouraging low-cost scheme for solving (1). Notation. • J(x) will denote the Jacobian matrix of F computed at x. • For all x ∈ IRn we denote g(x) = 2J(x)t F (x) = ∇F (x)22 . • The set of natural numbers will be denoted IN = {0, 1, 2, . . .}. • If {zk }k∈IN is a sequence and K = {k1 , k2 , k3 , . . .} is an infinite subset of IN such that ki < kj if i < j, we denote: lim zk = lim zkj . j→∞

k∈K

• The symbol · will always denote the Euclidian norm. 2

• B(x, ε) will denote the open ball with center x and radius ε. That is: B(x, ε) = {z ∈ IRn | z − x < ε}.

2

The new nonmonotone line-search strategy

The best known nonmonotone line-search technique for unconstrained optimization was introduced by Grippo, Lampariello and Lucidi [19]. It has been used to globalize the spectral gradient method [31] and some of its extensions for convex constrained optimization [5, 6] and nonlinear systems of equations [22]. Different nonmonotone line-search techniques, associated to Newton and quasi-Newton strategies, have been proposed for solving (1). See [17, 24]. Li and Fukushima [24] presented an interesting idea by means of which descent directions are not necessary to guarantee that each iteration is well defined. Let us briefly describe the Grippo-Lampariello-Lucidi (GLL) and the Li-Fukushima (LF) schemes. The GLL condition can be written as follows: f (xk + αk dk ) ≤

max

0≤j≤M −1

f (xk−j ) + γαk ∇f (xk )t dk ,

where M is a nonnegative integer, 0 < γ < 1 and f is a merit function such that f (x) = 0 if and only if F (x) = 0. The LF condition can be written as follows: F (xk + αk dk ) ≤ (1 + ηk )F (xk ) − γα2k dk 22 ,

where k ηk ≤ η < ∞. It follows that if ∇f (xk )t dk < 0, then the GLL condition is satisfied for αk sufficiently close to zero and we can compute a steplength αk by using a finite backtracking process. However, when ∇f (xk )t dk = 0, the existence of αk satisfying the GLL condition is not guaranteed. Unfortunately, when dk = ±F (xk ) for solving (1), ∇f (xk )t dk = ±2F (xk )t J(xk )F (xk ) could be close to zero or zero, and then stagnation or breakdown will occur during the backtracking process. One possible remedy is to use the LF condition. This condition does not need the computation of J(xk )dk . Moreover, it is satisfied, if αk is small enough, independently of the choice of dk . However, since ηk is usually very small when k is large, the Li-Fukushima strategy generally imposes an almost monotone behavior of the merit function when xk is close to a solution. This is not a good feature when one uses spectral gradient or spectral residual steps because, in these cases, the pure undamped methods (where αk = 1 for all k), although generally effective, use to be highly nonmonotone even in the neighborhood of an isolated solution. The reason for this is not completely understood but the analogy with the behavior of the spectral gradient (or Barzilai-Borwein) method for minimizing convex quadratics may be useful. See [30]. In the quadratic case the spectral gradient method does not need line-search strategies for being globally convergent, but the functional values do not decrease monotonically at all. Therefore, imposing any kind of monotonicity is not convenient. Many authors, including Fletcher [15], pointed out the necessity of avoiding monotonicity requirements in the spectral framework as much as possible. 3

In this work we combine and extend the GLL and LF conditions to produce a robust nonmonotone line-search globalization strategy that somehow takes into account the advantages of both schemes. Roughly speaking the new descent condition can be written as: (2) f (xk+1 ) ≤ max f (xk−j ) + ηk − γα2k f (xk ). 0≤j≤M −1

The GLL term max0≤j≤M −1 f (xk−j ) is responsible for the sufficiently nonmonotone behavior of f (xk ) even when k is large. On the other hand, the presence of ηk > 0 guarantees that all the iterations are well defined and the forcing term −γα2k f (xk ) provides the arguments for proving global convergence. We would like to mention that a similar extension that also combines the GLL with the LF conditions was presented and briefly discussed in the final remarks of [22]. However, it was neither analyzed nor tested. The present work was motivated by the need of studying the theoretical and practical properties of this globalization technique.

3

Model algorithm and convergence

We assume that F : IRn → IRn has continuous partial derivatives. Let nexp ∈ {1, 2}. Define f (x) = F (x)nexp for all x ∈ IRn . Assume that {ηk } is a sequence such that ηk > 0 for all k ∈ IN and ∞

ηk = η < ∞.

(3)

k=0

Assume that 0 < γ < 1 and 0 < σmin < σmax < ∞. Let M be a positive integer. Let τmin , τmax be such that 0 < τmin < τmax < 1. Given x0 ∈ IRn an arbitrary initial point, the algorithm that allows us to obtain xk+1 starting from xk is given below. Algorithm DF-SANE (Derivative free SANE). Step 1. • Choose σk such that |σk | ∈ [σmin , σmax ] (the spectral coefficient) • Compute f¯k = max{f (xk ), . . . , f (xmax{0,k−M +1} )}. • Set d ← −σk F (xk ). • Set α+ ← 1, α− ← 1. Step 2. If f (xk + α+ d) ≤ f¯k + ηk − γα2+ f (xk ) then Define dk = d, αk = α+ , xk+1 = xk + αk dk else if f (xk − α− d) ≤ f¯k + ηk − γα2− f (xk ) then Define dk = −d, αk = α− , xk+1 = xk + αk dk 4

else choose α+new ∈ [τmin α+ , τmax α+ ], α−new ∈ [τmin α− , τmax α− ], replace α+ ← α+new , α− ← α−new and go to Step 2. Remark. As we will see later, the coefficient σk will be intended to be an approximation (xk )2 of the quotient J(xkF )F (xk ),F (xk ) . This quotient may be positive or negative (or even null). Proposition 1. The iteration is well defined. Proof. Since ηk > 0, after a finite number of reductions of α+ the condition f (xk + α+ d) ≤ f¯k + ηk − γα2+ f (xk ) 2

necessarily holds.

In order to prove the convergence of the algorithm, we need some preliminary definitions (see [7]). For all k = 1, 2, . . . we define: Vk = max{f (x(k−1)M +1 ), . . . , f (xkM )}. Let ν(k) ∈ {(k − 1)M + 1, . . . , kM } be such that, for all k = 1, 2, . . ., f (xν(k) ) = Vk . Clearly, f (xkM +1 ) ≤ max{f (x(k−1)M +1 ), . . . , f (xkM )} + ηkM − γα2kM f (xkM ) = Vk + ηkM − γα2kM f (xkM ) ≤ Vk + ηkM , f (xkM +2 ) ≤ max{Vk , f (xkM +1 )} + ηkM +1 − γα2kM +1 f (xkM +1 ) ≤ Vk + ηkM + ηkM +1 − γα2kM +1 f (xkM +1 ) ≤ Vk + ηkM + ηkM +1 and so on. Therefore, by an inductive argument, f (xkM + ) ≤ Vk +

−1 j=0

ηkM +j − γα2kM +−1 f (xkM +−1)

for all = 1, 2, . . . , M . But ν(k + 1) ∈ {kM + 1, . . . , kM + M }, thus, Vk+1 = f (xν(k+1) ) ≤ Vk +

M −1 j=0

ηkM +j − γα2ν(k+1)−1 f (xν(k+1)−1 ).

5

(4)

So, for all k = 1, 2, . . . we have that f (xν(k+1) ) ≤ f (xν(k) ) +

M −1 j=0

ηkM +j − γα2ν(k+1)−1 f (xν(k+1)−1 ).

(5)

Using (4) and (5) we can prove the following propositions: Proposition 2. For all k, = 1, 2, . . ., ∞

f (xkM + ) ≤ f (xν(k) ) +

ηi ≤ f (xν(k) ) + η.

(6)

i=ν(k)

2

Proof. Straightforward, using (4) and (5). Proposition 3. lim α2ν(k)−1 f (xν(k)−1 ) = 0.

k→∞

Proof. Write the inequalities (5) for k = 1, 2, . . . , L. Observe that f (xν(k+1) ) occurs on the left-hand side of the k-th inequality and also on the right-hand side of the k + 1-th inequality. Adding the L inequalities, we get: (L+1)M −1

f (xν(L+1) ) ≤ f (xν(1) ) +

ηj − γ

L j=1

j=M

α2ν(j+1)−1 f (xν(j+1)−1 ).

Therefore, for all L = 1, 2, . . ., we obtain: γ

L

2 j=1 αν(j+1)−1 f (xν(j+1)−1 )

≤ f (xν(1) ) +

(L+1)M −1

≤ f (xν(1) ) + ≤ f (xν(1) ) + η So, the series sult.

∞

2 j=1 αν(j+1)−1 f (xν(j+1)−1 )

j=M

ηj − f (xν(L+1) )

j=M

ηj

(L+1)M −1

is convergent. This implies the desired re2

From now on we define: K = {ν(1) − 1, ν(2) − 1, ν(3) − 1, . . .}.

(7)

K+ = {ν(1), ν(2), ν(3), . . .}.

(8)

ν(j + 1) ≤ ν(j) + 2M − 1 for all j = 1, 2, . . .

(9)

and Observe that In Theorem 1 we prove that, at every limit point x∗ of the subsequence {xk }k∈K one necessarily has that J(x∗ )F (x∗ ), F (x∗ ) = F (x∗ ), g(x∗ ) = 0. In other words the gradient of F (x)2 at x∗ is orthogonal to the residual F (x∗ ).

6

Theorem 1. Assume that {xk }k∈IN is generated by the DF-SANE algorithm. Then, every limit point x∗ of {xk }k∈K satisfies F (x∗ ), J(x∗ )t F (x∗ ) = 0.

(10)

Proof. By Proposition 3 we have that lim α2k f (xk ) = 0.

k∈K

(11)

Let x∗ be a limit point of {xk }k∈K . Let K1 ⊂ K be an infinite sequence of indices such that lim xk = x∗ . k∈K1

Then, by (11),

lim α2k f (xk ) = 0.

k∈K1

(12)

If {αk }k∈K1 does not tend to zero, there exists an infinite sequence of indices K2 ⊂ K1 such that αk is bounded away from zero for k ∈ K2 . Then, by (12), lim f (xk ) = 0.

k∈K2

Since f is continuous and limk∈K2 xk = x∗ this implies that f (x∗ ) = 0. So, we only need to analyze the case lim αk = 0.

k∈K1

(13)

At Step 2 of Algorithm DF-SANE one tests the inequality f (xk + α+ d) ≤ f¯k + ηk − γα2+ f (xk ).

(14)

If (14) does not hold, the inequality f (xk − α− d) ≤ f¯k + ηk − γα2− f (xk )

(15)

is tested. The first trial steps at (14)-(15) are α+ = α− = 1. By (13), there exists k0 ∈ K1 such that αk < 1 for all k ≥ k0 , k ∈ K1 . Therefore, for those iterations k, the step αk that − fulfills (14) or (15) is necessarily preceded by steps α+ k and αk that do not satisfy (14) and (15) respectively. Now, by the choice of α+new and α−new at Step 2 of the algorithm, for all k ≥ k0 , k ∈ K1 , there exists mk ∈ IN such that mk . αk ≥ τmin

So, by (13), lim mk = ∞.

k∈K1

But, again by the choice of α+new and α−new , mk −1 α+ k ≤ τmax

7

and

mk −1 α− k ≤ τmax .

Therefore, since τmax < 1,

− lim α+ k = lim αk = 0.

k∈K1

k∈K1

− Clearly, the fact that (14) and (15) are not satisfied by α+ k and αk respectively implies that + 2 ¯ (16) f (xk − α+ k σk F (xk )) > fk + ηk − γ(αk ) f (xk )

and

− 2 ¯ f (xk + α− k σk F (xk )) > fk + ηk − γ(αk ) f (xk )

(17)

for all k ∈ K1 , k ≥ k0 . The inequality (16) implies that + 2 f (xk − α+ k σk F (xk )) > f (xk ) − γ(αk ) f (xk ).

So,

+ 2 f (xk − α+ k σk F (xk )) − f (xk ) ≥ −γ(αk ) f (xk ).

By Proposition 2, {f (xk )}k∈IN is bounded above, say, by a constant c > 0. Thus, + 2 f (xk − α+ k σk F (xk )) − f (xk ) ≥ −cγ(αk ) .

(18)

If nexp = 2 this means that 2 2 + 2 F (xk − α+ k σk F (xk )) − F (xk ) ≥ −cγ(αk ) .

Now, the subsequence {xk }k∈K1 is convergent and, therefore, bounded. Since F (xk ), + α+ k and σk are also bounded, we have that {xk − αk σk F (xk )}k∈K1 is bounded. So, by the continuity of F , there exists c1 > 0 such that F (xk − α+ k σk F (xk )) + F (xk ) ≤ c1 for all k ∈ K1 . So, if nexp = 1, multiplying both sides of (18) by F (xk − α+ k σk F (xk )) + F (xk ), we obtain: 2 2 + 2 F (xk − α+ k σk F (xk )) − F (xk ) ≥ −cc1 γ(αk ) . Thus, taking C = max{c, cc1 } we obtain, for nexp ∈ {1, 2}, that + 2 2 2 F (xk − α+ k σk F (xk )) − F (xk ) ≥ −Cγ(αk ) .

So,

2 2 F (xk − α+ k σk F (xk )) − F (xk ) ≥ −Cγα+ k. α+ k

By the Mean Value Theorem, there exists ξk ∈ [0, 1] such that + g(xk − ξk α+ k σk F (xk )), −σk F (xk ) ≥ −Cγαk .

Therefore,

+ σk g(xk − ξk α+ k σk F (xk )), −F (xk ) ≥ −Cγαk .

8

(19)

Now, if σk > 0 for infinitely many indices k ∈ K2 ⊂ K1 , the inequality (19) implies that, for k ∈ K2 , k ≥ k0 , g(xk − ξk α+ k σk F (xk )), F (xk ) ≤

Cγα+ Cγα+ k k ≤ . σk σmin

(20)

Using (17) and proceeding in the same way, we obtain that, for k ∈ K2 , k ≥ k0 , g(xk + ξk α− k σk F (xk )), F (xk ) ≥ −

Cγα− Cγα− k k ≥− σk σmin

(21)

for some ξk ∈ [0, 1]. − Since α+ k → 0, αk → 0, and σk F (xk ) is bounded, taking limits in (20) and (21), we obtain that (22) g(x∗ ), F (x∗ ) = 0. If σk < 0 for infinitely many indices, proceeding in an analogous way, we also deduce (22). Thus, the thesis is proved. 2 Corollary 1. Assume that {xk }k∈IN is generated by the DF-SANE algorithm, x∗ is a limit point of {xk }k∈K and for all v ∈ IRn , v = 0, J(x∗ )v, v = 0. Then, F (x∗ ) = 0. 2

Proof. Straightforward, using Theorem 1.

As usually, we say that a matrix A ∈ IRn×n is positive-definite if Av, v > 0 for all v ∈ IRn , v = 0. If J(x) is positive-definite for all x ∈ IRn we say that the mapping F is strictly monotone. If F is strictly monotone or −F is strictly monotone, we say that the mapping F is strict. If a mapping is strict and admits a solution, its solution must be unique. See [29], Chapter 5. Corollary 2. Assume that {xk }k∈IN is generated by the DF-SANE algorithm and the mapping F is strict. Then, every bounded subsequence of {xk }k∈K converges to the solution of (1). 2

Proof. Straightforward, using Corollary 1.

Corollary 3. Assume that {xk }k∈IN is generated by the DF-SANE algorithm, the mapping F is strict and the level set {x ∈ IRn | f (x) ≤ f (x0 ) + η} is bounded. Then, {xk }k∈K converges to the solution of (1). 2 Proof. Straightforward, using Corollary 2. So far, we proved that at every limit point of {xk }k∈K the gradient of F (x∗ )2 is orthogonal to the residual F (x∗ ). The case in which there exists a limit point of {xk }k∈IN 9

at which F (x∗ ) = 0 deserves further analysis. The theorem below shows that, when such a limit point exists, then all the limit points of the sequence generated by the algorithm are solutions of the nonlinear system. Theorem 2. Assume that the sequence {xk }k∈IN is generated by the DF-SANE Algorithm and that there exists a limit point x∗ of {xk }k∈IN such that F (x∗ ) = 0. Then lim F (xk ) = 0.

k→∞

Consequently, F (x) vanishes at every limit point of {xk }k∈IN . Proof. Let K1 be an infinite subset of IN such that lim xk = x∗

k∈K1

and F (x∗ ) = 0.

(23)

Then, lim F (xk ) = 0.

k∈K1

Therefore, since xk+1 = xk ± αk σk F (xk ) and |αk σk | ≤ σmax for all k ∈ IN , lim xk+1 − xk = 0.

k∈K1

So, lim xk+1 = x∗ .

k∈K1

Proceeding by induction, we may prove that for all fixed ∈ {1, 2, . . . , 2M }, lim xk+ = x∗ .

k∈K1

(24)

Now, by (9), for all k ∈ K1 , we can choose µ(k) ∈ {0, 1, . . . , 2M − 1} such that k + µ(k) ∈ K+ .

(25)

Moreover, the same µ(k) must be repeated infinitely many times. So, there exists 0 ∈ {0, 1, . . . , 2M − 1} such that µ(k) = 0 for infinitely many indices k ∈ K1 . Consider K2 = {k + µ(k) | k ∈ K1 and µ(k) = 0 }. By (24) and (25) we have that K2 ⊂ K+ and lim xk = x∗ .

k∈K2

Then, by (23), lim F (xk ) = 0.

k∈K2

10

Since K2 ⊂ K+ , there exists J1 an infinite subset of IN such that lim xν(j) = x∗

(26)

lim f (xν(j) ) = lim Vj = 0.

(27)

j∈J1

and j∈J1

j∈J1

Let us write J1 = {j1 , j2 , j3 , . . .}, where j1 < j2 < j3 < . . . and limi→∞ ji = ∞. By (27) we have that (28) lim Vji = 0. i→∞

Now, by (5) we have that for all j ∈ IN , j > ji , Vj ≤ Vji +

∞ =M ji

Therefore, sup Vj ≤ Vji +

j≥ji

By the summability of ηk , lim

i→∞

∞

η .

∞

η .

(29)

=M ji

η = 0.

=M ji

Then, by (28), taking limits on both sides of (29), we get: lim sup Vj = 0.

i→∞ j≥ji

Thus, lim Vj = 0.

j→∞

By the definition of Vj this implies that lim F (xk ) = lim f (xk ) = 0,

k→∞

k→∞

(30)

as we wanted to prove. The second part of the proof is straightforward: if x ¯ is a limit point of {xk }k∈IN there ¯. By (30) and the continuity of F we exists a subsequence {xk }k∈K3 that converges to x have that F (¯ x) = lim F (xk ) = 0. k∈K3

2 Now we prove two theorems of local convergence type. Theorem 3 says that if an isolated solution is a limit point of {xk }, then the whole sequence xk converges to this solution. Theorem 3. Assume that the sequence {xk }k∈IN is generated by the DF-SANE algorithm and that there exists a limit point x∗ of the sequence {xk }k∈IN such that F (x∗ ) = 0. Moreover, assume that there exists δ > 0 such that F (x) = 0 whenever 0 < x − x∗ ≤ δ. Then, 11

limk→∞ xk = x∗ . Proof. By Theorem 2 we have that lim F (xk ) = 0.

k→∞

Therefore, since αk and σk are bounded, lim xk+1 − xk = 0.

k→∞

Thus, there exists k1 ∈ IN such that xk+1 − xk ≤ δ/2 for all k ≥ k1 .

(31)

Consider the set

δ ≤ x − x∗ ≤ δ}. 2 By hypothesis, S does not contain any solution of F (x) = 0. But, by Theorem 2, all the limit points of {xk }k∈IN are solutions of (1). Therefore, S does not contain any limit point of {xk }k∈IN . Thus, since S is compact, it cannot contain infinitely many iterates xk . This implies that there exists k2 ∈ IN such that S = {x ∈ IRn |

/ S for all k ≥ k2 . xk ∈

(32)

Let k3 ≥ max{k1 , k2 } be such that xk3 − x∗ ≤ δ/2. By (31), we have: xk3 +1 − x∗ ≤ xk3 − x∗ + xk3 +1 − xk3 ≤ δ. / S, therefore, we have that But, by (32), xk3 +1 ∈ xk3 +1 − x∗ ≤ δ/2. Continuing this argument inductively we have that xk − x∗ ≤ δ/2 for all k ≥ k3 .

(33)

This implies that all the limit points x ¯ of the sequence {xk }k∈IN are such that ¯ x − x∗ ≤ δ/2. By Theorem 2, F (¯ x) = 0 at every limit point x ¯ and, by the hypothesis of this theorem, the set defined by 0 < x − x∗ ≤ δ/2 does not contain solutions of (1). Therefore, this set does not contain limit points. So, the only limit point x ¯ that satisfies x − x∗ ≤ δ/2 2 is x∗ . So, by (33), the sequence converges to x∗ as we wanted to prove.

12

Theorem 4 is our second local convergence theorem. For proving it we need a new definition and a technical lemma. We say that x∗ ∈ IRn is a strongly isolated solution of (1) if F (x∗ ) = 0 and there exists ε > 0 such that 0 < x − x∗ ≤ ε ⇒ J(x)F (x), F (x) = 0. That is, in a reduced neighborhood of a strongly isolated solution the residual F (x) is not orthogonal to the gradient J(x)t F (x). Theorem 4 says that, if the initial point x0 is close enough to an strongly isolated solution x∗ , then the sequence {xk } converges to x∗ . Observe that this cannot be deduced from Theorem 3 and, moreover, Theorem 3 cannot be deduced from this result either, since the strong isolation assumption is not necessary to prove that theorem. Lemma 1. Assume that F (x∗ ) = 0, and k0 ∈ IN . Then, for all δ1 > 0, there exists δ ∈ (0, δ1 ] such that for any possible initial point x0 such that x0 − x∗ < δ, the k0 -th iterate computed by the DF-SANE algorithm will satisfy xk0 − x∗ < δ1 independently of the choices of αk , σk for k = 0, 1, . . . , k0 − 1. Proof. We proceed by induction. If k0 = 0 the result is trivial with δ = δ1 . Assume that it is true for k = 0, 1, . . . , k0 and let us prove it with for k0 + 1. Let δ1 > 0. By the continuity of F , since F (x∗ ) = 0 and |σk0 αk0 | ≤ σmax , there exists δ2 ∈ (0, δ1 /2)

(34)

xk0 − x∗ < δ2 ⇒ xk0 +1 − xk0 < δ1 /2.

(35)

such that But, by the inductive hypothesis, there exists δ ∈ (0, δ2 ] such that x0 − x∗ < δ ⇒ xk0 − x∗ < δ2 .

(36)

By (34), (35) and (36), if x0 − x∗ < δ, we have: xk0 +1 − x∗ ≤ xk0 − x∗ + xk0 +1 − xk0 < δ2 + δ1 /2 < δ1 . 2

This completes the proof.

Theorem 4. Assume that the sequence {xk }k∈IN is generated by the DF-SANE algorithm and x∗ is a strongly isolated solution. Then, there exists δ > 0 such that x0 − x∗ < δ ⇒ lim xk = x∗ . k→∞

Proof. Let ε > 0 be such that 0 < x − x∗ ≤ ε ⇒ J(x)F (x), F (x) = 0.

13

(37)

Since |σk αk | ≤ σmax , xk+1 − xk ≤ |σk αk |F (xk ) and F (x∗ ) = 0, the continuity of F implies that there exists ε1 ∈ (0, ε/2] such that xk − x∗ ≤ ε1 ⇒ xk+1 − xk < ε/2. Define

(38)

Cε = {x ∈ IRn | ε1 ≤ x − x∗ ≤ ε}.

ˆ ∈ Cε , β > 0, such that Since Cε is compact and f is continuous, there exists x β ≡ f (ˆ x) ≤ f (x) for all x ∈ Cε .

(39)

Since f is continuous, the set {x ∈ B(x∗ , ε1 /2) | f (x) < β/2} is an open neighborhood of x∗ . Therefore, there exists δ1 ∈ (0, ε1 /2) such that x − x∗ ≤ δ1 ⇒ f (x) < β/2. Let k¯0 ∈ K+ be such that

∞ ¯0 k=k

ηk < β/2.

(40)

(41)

By Proposition 2, there exists k0 ≥ k¯0 (k0 ≤ k¯0 + M ) such that f (xk ) ≤ f (xk¯0 ) +

β 2

for all k ≥ k0 .

(42)

Applying Lemma 1 for k¯0 and k0 , we deduce that there exists δ ∈ (0, δ1 ) such that x0 − x∗ ≤ δ ⇒ xk¯0 − x∗ ≤ δ1 < ε1 .

(43)

x0 − x∗ ≤ δ ⇒ xk0 − x∗ ≤ δ1 < ε1 .

(44)

and By (40) and (43), f (xk¯0 ) < β/2. So, by (42), f (xk ) < β for all k ≥ k0 .

(45)

Let us prove by induction that, choosing x0 − x∗ ≤ δ, xk0 +j − x∗ < ε1

(46)

for all j ∈ IN . By (44), we have that (46) is true for j = 0. Assume, as inductive hypothesis, that, for some j ≥ 1, xk0 +j−1 − x∗ < ε1 . Then, xk0 +j−1 − x∗ < ε/2. But, by (38), xk0 +j − x∗ ≤ xk0 +j−1 − x∗ + xk0 +j − xk0 +j−1 < ε/2 + ε/2 = ε. 14

(47)

Since, by (45), f (xk0 +j ) < β, (39) and (47) imply that xk0 +j − x∗ ≤ ε1 . This completes the inductive proof. ¯ of {xk }k∈IN are such that So, {xk }k≥k0 ⊂ B(x∗ , ε1 ). Therefore, all the limit points x ¯ x − x∗ ≤ ε1 < ε. But, by (37) and Theorem 1, the only possible limit point is x∗ . Therefore, limk→∞ xk = x∗ as we wanted to prove. 2 Corollary 4. Assume that the sequence {xk }k∈IN is generated by the DF-SANE algorithm and J(x∗ ) is either positive definite or negative definite. Then, there exists δ > 0 such that x0 − x∗ < δ ⇒ lim xk = x∗ . k→∞

Proof. Using the continuity of J we obtain that x∗ is strongly isolated. Then, the thesis follows from Theorem 4. 2

4

Numerical results

We implemented DF-SANE with the following parameters: nexp = 2, σmin = 10−10 , σmax = 1010 , σ0 = 1, τmin = 0.1, τmax = 0.5, γ = 10−4 , M = 10, ηk = F (x0 )/(1 + k)2 for all k ∈ IN . The spectral steplength was computed by the formula σk =

sk , sk , sk , yk 1

where sk = xk+1 − xk and yk = F (xk+1 ) − F (xk ). Observe that yk = [ so σk is the inverse of the Rayleigh quotient 1

[

0

0

J(xk + tsk )dt]sk ,

J(xk + tsk )dt]sk , sk . sk , sk

However, if |σk | ∈ [σmin , σmax ], we replace the spectral coefficient by ⎧ ⎪ ⎨

σk =

1 if F (xk ) > 1, F (xk )−1 if 10−5 ≤ F (xk ) ≤ 1, ⎪ ⎩ if F (xk ) < 10−5 . 105

Since we use big values for σmax and 1/σmin this replacement rarely occurs. In the few cases in which the replacement is necessary the first trial point is xk −F (xk ) if F (xk ≥ 1. If 10−5 ≤ F (xk ) ≤ 1 the step σk is such that the distance between xk and the first trial point is equal to 1. When F (xk ) < 1 we prefer to allow the distance between xk and the trial point to be smaller, choosing for σk the fixed value 10−5 . For choosing α+new and α−new at Step 2, we proceed as follows. Given α+ > 0, we take α+new > 0 as ⎧ ⎪ ⎨ τmin α+ if αt < τmin α+ , τmax α+ if αt > τmax α+ , α+new = ⎪ ⎩ α otherwise, t 15

where αt =

α2+ f (xk ) . f (xk + α+ d) + (2α+ − 1)f (xk )

We use similar formulae for choosing α−new as a function of α− , f (xk ) and f (xk + α− d). This parabolic model is similar to the one described in [21, pp.142-143], in which the Jacobian matrix at xk is replaced by the identity matrix (see also [12]). We also implemented SANE [22] with the following parameters: γ = 10−4 , ε = 10−8 , σ1 = 0.1, σ2 = 0.5, α0 = 1, M = 10, and ⎧ ⎪ ⎨

1 if F (xk ) > 1, −1 −5 if 10 ≤ F (xk ) ≤ 1, F (xk ) δ= ⎪ ⎩ if F (xk ) < 10−5 . 105 Both in SANE and DF-SANE we stop the process when F (x0 ) F (xk ) √ ≤ ea + er √ , n n

(48)

where ea = 10−5 and er = 10−4 . We ran SANE and DF-SANE using a set of large-scale test problems. All the test problems are fully described in Appendix A. The numerical results are shown in Tables 1, 2, and 3. We report only one failure, and use the symbol (*), when running problem 18 with n = 100. In that case, FD-SANE fails x) = 0, but because it generates a sequence that converges to a point x ¯ at which F (¯ x)T g(¯ F (¯ x) = 0 and g(¯ x) = 0. The results from Tables 1 and 2 are summarized in Table 3. In Table 3 we compare the performance (number of problems for which each method is a winner with respect to number of iterations, function evaluations and computer time) between SANE and DFSANE. In Tables 1 and 2 we report the problem number and the dimension of the problem (Function(n)), the number of iterations (IT), the number of function evaluations (including the additional functional evaluations that SANE uses for approximating directional derivatives) (FE), the number of backtrackings (BK), and the CPU time in seconds (T). In SANE it is necessary to evaluate the directional derivative F (xk ), J(xk )t F (xk ) at each iteration. Since we assume that the Jacobian is not easily available we use the fact that F (xk ), J(xk )t F (xk ) = J(xk )F (xk ), F (xk ) and the approximation J(xk )F (xk ) ≈

F (xk + tF (xk )) − F (xk ) t

where t > 0 is a small parameter. Therefore, computing the approximate directional derivative involves an additional function evaluation (included in FE) at each iteration. Our results indicate that the new fully free-derivative scheme DF-SANE is competitive with the SANE algorithm, which in turn is preferable quite frequently to the NewtonKrylov methods: Newton - GMRES, Newton - BiCGSTAB, and Newton - TFQMR (see [22] for comparisons). In particular, when comparing SANE with Newton - GMRES (which was the Krylov-like method with the best performance in [22]) the summary results shown in Table 4 were obtained. 16

Function(n) 1( 1000) 1(10000) 2( 500) 2( 2000) 3( 100) 3( 500) 4( 99) 4( 999) 5( 9) 5( 49) 6( 100) 6(10000) 7( 100) 7(10000) 8( 1000) 8(10000) 9( 100) 9( 1000) 10( 100) 10( 500) 11( 99) 11( 399) 12( 1000) 12(10000) 13( 100) 13( 1000) 14( 2500) 14(10000) 15( 5000) 15(15000) 16( 500) 16( 2000) 17( 100) 17( 1000) 18( 50) 18( 100) 19( 1000) 19(50000) 20( 100) 20( 1000) 21( 399) 21( 9999) 22( 1000) 22(15000)

IT 5 2 6 2 5 1 130 130 23 552 2 2 23 23 1 1 6 6 1 1 11 11 6 5 3 4 11 12 5 5 14 16 9 7 24 24 5 5 32 51 4 4 1 1

SANE FE BK 10 0 4 0 14 1 7 1 10 0 2 0 335 69 335 69 59 12 1942 424 5 1 5 1 49 2 49 2 2 0 2 0 12 0 12 0 8 1 8 1 34 4 34 4 14 2 12 2 8 1 10 1 25 1 28 1 10 0 10 0 29 1 32 0 19 1 15 1 50 2 49 1 10 0 10 0 67 2 117 9 9 1 9 1 2 0 2 0

T .010 .060 .010 .010 .010 .000 .060 .611 .000 .070 .000 .040 .010 .581 .000 .030 .040 3.826 .000 .010 .000 .020 .040 .421 .000 .020 .210 1.082 .060 .230 .000 .010 .010 .030 .010 .000 .010 .771 .010 .100 .010 .200 .000 .030

IT 5 2 11 11 5 1 99 101 42 732 3 3 23 23 1 1 6 6 2 2 17 17 30 23 3 4 11 12 5 5 14 16 9 7 19 * 5 5 40 44 5 5 1 1

DF-SANE FE BK 5 0 2 0 11 0 11 0 5 0 1 0 289 66 325 71 68 12 2958 660 3 0 3 0 29 2 29 2 1 0 1 0 6 0 6 0 12 1 12 1 49 7 49 7 62 12 59 11 7 1 8 1 17 1 20 1 5 0 5 0 16 1 16 0 11 1 9 1 21 1 * * 5 0 5 0 42 1 62 5 7 1 7 1 2 0 2 0

T .000 .050 .000 .030 .000 .010 .060 .611 .000 .130 .000 .060 .000 .511 .000 .030 .020 2.063 .000 .010 .000 .030 .180 2.073 .010 .010 .160 .871 .050 .180 .010 .010 .000 .030 .000 * .010 .611 .010 .070 .000 .190 .000 .040

Table 1: SANE vs. DF-SANE for the first set of test problems.

17

Function(n) 23( 500) 23(1000) 24( 500) 24( 1000) 25( 100) 25( 500) 26( 1000) 26( 10000) 27( 50) 27( 100) 28( 100) 28(1000) 29( 100) 29(1000) 30( 99) 30(9999) 31( 1000) 31( 5000) 32( 500) 32( 1000) 33( 1000) 33( 5000) 34( 1000) 34(5000) 35( 1000) 35( 5000) 36( 1000) 36(5000) 37( 1000) 37(5000) 38( 1000) 38(5000) 39(1000) 39(5000) 40(1000) 40(5000) 41( 500) 41(1000) 42( 1000) 42( 5000) 43( 100) 43( 500) 44( 1000) 44(5000)

IT 1 1 25 265 2 3 1 1 10 11 1 1 1 1 18 18 4 4 6 6 3 3 22 12 21 29 21 44 23 23 19 19 55 55 1 1 7 2 110 110 80 488 2 2

SANE FE BK 10 1 11 1 54 4 915 159 6 1 9 1 2 0 2 0 20 0 22 0 2 0 2 0 4 1 4 1 39 3 39 3 9 0 9 0 12 0 12 0 20 2 22 1 52 4 27 1 45 2 63 3 45 2 96 7 49 2 49 2 40 2 40 2 126 13 126 13 2 0 2 0 15 1 5 1 268 45 268 45 175 11 1704 348 4 0 4 0

T .000 .000 .030 .951 .000 .000 .000 .020 .260 1.072 .000 .000 .010 .010 .000 .791 .030 .160 .010 .020 .050 .270 .110 .361 .180 1.402 .270 2.954 .010 .140 .050 .320 .160 1.041 .000 .020 .010 .010 .190 1.392 .010 .601 .020 .100

IT 2 2 54 17 2 3 1 1 10 11 1 1 1 1 11 11 6 6 6 6 37 4 78 12 21 38 28 26 26 26 25 25 14 14 1 1 7 3 173 173 86 586 4 3

DF-SANE FE BK 18 1 20 1 109 18 25 3 6 1 9 1 1 0 1 0 10 0 11 0 1 0 1 0 5 1 5 1 16 2 16 2 6 0 6 0 7 0 7 0 50 3 16 2 155 26 18 1 27 2 48 3 34 2 36 4 38 5 38 5 30 2 30 2 20 1 20 1 1 0 1 0 9 1 3 0 412 85 412 85 108 9 1162 193 4 0 3 0

T .010 .000 .070 .030 .000 .000 .000 .020 .140 .561 .000 .000 .000 .010 .000 .411 .020 .130 .010 .020 .120 .230 .381 .280 .110 1.202 .210 1.272 .010 .210 .040 .340 .030 .210 .000 .020 .010 .000 .330 2.654 .010 .451 .030 .090

Table 2: SANE vs. DF-SANE for the second set of test problems.

18

Method SANE DF-SANE Undecided

IT 37 10 41

FE 19 64 5

T 20 38 30

Table 3: Winners with respect to iterations, evaluations and time.

Method Newton-GMRES SANE

IT 51 9

FE 9 51

T 19 41

Table 4: Winners with respect to iterations, evaluations and time between SANE and Newton-GMRES reported in [22].

5

Conclusions

The algorithm presented in this paper may be considered a damped quasi-Newton method for solving nonlinear systems. See [12, 26]. The iterations are xk+1 = xk − αk Bk−1 F (xk )

(49)

where the Jacobian approximation Bk has the very simple form 1 I. σk

(50)

sk , sk . yk , sk

(51)

Bk = In most cases, σk =

Due to the simplicity of the Jacobian approximation, the method is very easy to implement, memory requirements are minimal and, so, its use for solving large-scale nonlinear systems is attractive. In [22] it was shown that, perhaps surprisingly, a procedure that obeys the scheme (49)–(51) behaves reasonably well for solving a number of classical nonlinear systems, most of them coming from discretization of boundary value problems. However, the algorithm introduced in [22] is not completely satisfactory in the sense that a directional derivative estimate is needed in order to ensure convergence and even well-definiteness of each iteration. In the present research we overcome that difficulty introducing the method DF - SANE, which does not need directional derivatives at all. We were able to prove several convergence results: 1. There exists an infinite set of indices K ⊂ IN such that at every limit point of the subsequence {xk }k∈K , the gradient of F (x)2 is orthogonal to the residual F (x). Therefore, if F (x) has bounded level sets, there exists a limit point x∗ of {xk }k∈IN such that J(x∗ )F (x∗ ), F (x∗ ) = 0. 19

2. If some limit point of {xk }k∈IN is a solution of (1), then every limit point is a solution; 3. If a limit point x∗ of {xk }k∈IN is an isolated solution then the whole sequence converges to x∗ ; 4. If the initial point x0 is close enough to some strongly isolated solution x∗ then the whole sequence converges to x∗ . These results are obtained without using the specific formula of σk employed in our experiments. However, the method does not behave well for every choice of σk . Therefore, much has to be said, from the theoretical point of view, to explain the behavior of algorithms associated to the safeguarded spectral choice of the steplength used here. In particular, although the theoretical properties of the Barzilai-Borwein method for minimizing convex quadratics are now well understood (see [30]), nothing is known about the properties of the spectral residual method for solving nonsymmetric linear systems. Global and local convergence theorems, as the ones presented in this paper, smooth the path for proving results on the order of convergence. Nevertheless, it is necessary to understand what happens in the linear case first. Since the spectral residual method is a quasi-Newton method where the Jacobian approximation is a multiple of the identity matrix, the best behavior of this method must be expected when true Jacobians are close to matrices of that type. The analogy with the Barzilai-Borwein method allows one to conjecture in which (more general) situations the method should behave well. If the Jacobians are close to symmetric matrices with clustered eigenvalues (see [27]) a good behavior of the Barzilai-Borwein method can be predicted and, so, we also predict a fine behavior of the spectral residual method. Very likely, in many practical situations the performance of the method should be improved using some kind of preconditioning that transforms the Jacobian on a matrix with a small number of clusters of eigenvalues. So, with respect to preconditioning features, the situation is analogous to the one of Krylov-subspace methods. Preconditioned spectral gradient method for minimization were introduced in [25]. Our first set of experiments are discretization of boundary value problems. In general, the Jacobians are positive definite, so that the mappings F are generally monotone, or even strictly monotone. According to the corollaries of Theorem 1 this favors the behavior of DF-SANE, but also favors the behavior of almost every nonlinear-system solver. Since we are not using preconditioning at all, in general eigenvalues are not clustered. In some problems the Jacobians are well conditioned and in some other problems they are not. Moreover, in some problems the Jacobian is singular at the solution. In principle illconditioning affects adversely both spectral methods as Krylov subspace methods. The second set of 22 problems does not show special characteristics from the point of view of positive definiteness or conditioning. Moreover, some of these problems have many solutions. In principle, we do not have strong reasons to predict a good behavior of DF-SANE, therefore the rather robust and efficient performance of the new algorithm for solving these problems is a pleasantly surprising fact that needs theoretical explanation. Acknowledgment We are indebted to Ra´ ul Vignau, for his careful reading of the first draft of this paper. We are also indebted to an anonymous referee. 20

Appendix A: Test functions We now list the test functions F1 (x) = (f1 (x), . . . , fn (x))t and the associated initial guess x0 . 1. Exponential function 1 f1 (x) = ex1 −1 − 1,

fi (x) = i exi −1 − xi ,

Initial guess: x0 =

n n n n−1 , n−1 , . . . , n−1

t

for i = 2, 3, . . . , n.

.

2. Exponential function 2 f1 (x) = ex1 − 1, i xi (e + xi−1 − 1) , fi (x) = 10

Initial guess: x0 =

1 , 1 , . . . , n12 n2 n2

t

for i = 2, 3, . . . , n.

.

3. Exponential function fi (x) = fn (x) =

Initial guess: x0 =

i 2 1 − x2i − e−xi , 10 n 2 1 − e−xn . 10

1 2 n 4n2 , 4n2 , . . . , 4n2

t

for i = 2, 3, . . . , n − 1,

.

4. Diagonal function premultiplied by a quasi-orthogonal matrix ( n is a multiple of 3) [17, pp. 89-90] For i = 1, 2, . . . , n/3, f3i−2 (x) = 0.6x3i−2 + 1.6x33i−2 − 7.2x23i−1 + 9.6x3i−1 − 4.8,

f3i−1 (x) = 0.48x3i−2 − 0.72x33i−1 + 3.24x23i−1 − 4.32x3i−1 − x3i + 0.2x33i + 2.16, f3i (x) = 1.25x3i − 0.25x33i .

t

Initial guess: x0 = −1, 12 , −1, . . . , −1, 12 , −1 . 5. Extended Rosenbrock function ( n is even) [17, p. 89] For i = 1, 2, . . . , n/2,

f2i−1 (x) = 10 x2i − x22i−1 , f2i (x) = 1 − x2i−1 . Initial guess: x0 = (5, 1, . . . , 5, 1)t . 21

6. Chandrasekhar’s H-equation [21, p. 198]

F6 (H)(µ) = H(µ) − 1 − The discretized version is:

c 2

1 µH(ν)

µ+ν

0

⎛

−1

dµ

= 0.

⎞−1

n c µi xj ⎠ fi (x) = xi − ⎝1 − 2n j=1 µi + µj

,

for i = 1, 2, . . . , n,

with c ∈ [0, 1) and µi = (i − 1/2)/n, for 1 ≤ i ≤ n. (In our experiments we take c = 0.9). Initial guess: x0 = (1, 1, . . . , 1)t . 7. Badly scaled augmented Powell’s function ( n is a multiple of 3) [17, p. 89] For i = 1, 2, . . . , n/3, f3i−2 (x) = 104 x3i−1 x3i−1 − 1, f3i−1 (x) = exp (−x3i−2 ) + exp (−x3i−1 ) − 1.0001, f3i (x) = φ(x3i ), with

⎧ ⎪ −2 ⎨ 0.5t −592t3 + 888t2 + 4551t − 1924 /1998, φ(t) = ⎪ ⎩

0.5t + 2

t

t≤1 −1 < t < 2 t ≥ 2.

Initial guess: x0 = 10−3 , 18, 1, 10−3 , 18, 1, . . . . 8. Trigonometric function ⎛

fi (x) = 2 ⎝n + i(1 − cos xi ) − sin xi −

n

⎞

cos xj ⎠ (2 sin xi − cos xi ) .

j=1

Initial guess: x0 =

101 101 100n , . . . , 100n

t

.

9. Singular function 1 3 1 2 x + x 3 1 2 2 1 i 1 fi (x) = − x2i + x3i + x2i+1 , 2 3 2 1 2 n 3 fn (x) = − xn + xn . 2 3 f1 (x) =

for i = 2, 3, . . . , n − 1,

Initial guess: x0 = (1, 1, . . . , 1)t . 10. Logarithmic function fi (x) = ln(xi + 1) − Initial guess: x0 = (1, 1, . . . , 1)t . 22

xi , n

for i = 1, 2, . . . , n.

11. Broyden Tridiagonal function [18, pp. 471-472] f1 (x) = (3 − 0.5x1 )x1 − 2x2 + 1, fi (x) = (3 − 0.5xi )xi − xi−1 − 2xi+1 + 1,

for i = 2, 3, . . . , n − 1,

fn (x) = (3 − 0.5xn )xn − xn−1 + 1. Initial guess: x0 = (−1, −1, . . . , −1)t . 12. Trigexp function [18, p. 473] f1 (x) = 3x31 + 2x2 − 5 + sin(x1 − x2 ) sin(x1 + x2 ), fi (x) = −xi−1 e(xi−1 −xi ) + xi (4 + 3x2i ) + 2xi+1 + sin(xi − xi+1 ) sin(xi + xi+1 ) − 8, (xn−1 −xn )

fn (x) = −xn−1e

for i = 2, 3, . . . , n − 1,

+ 4xn − 3.

Initial guess: x0 = (0, 0, . . . , 0)t . 13. Variable band function 1 [18, p. 474] f1 (x) = −2x21 + 3x1 − 2x2 + 0.5xα1 + 1,

fi (x) = −2x21 + 3xi − xi−1 − 2xi+1 + 0.5xαi + 1,

fn (x) =

−2x2n

for i = 2, 3, . . . , n − 1,

+ 3xn − xn−1 + 0.5xαn + 1,

and αi is a random integer number in [αimin , αimax ], where αimin = max[1, i − 2] and αimax = min[n, i + 2], for all i. Initial guess: x0 = (0, 0, . . . , 0)t . 14. Variable band function 2 [18, p. 474] f1 (x) = −2x21 + 3x1 − 2x2 + 0.5xα1 + 1,

fi (x) = −2x21 + 3xi − xi−1 − 2xi+1 + 0.5xαi + 1,

fn (x) =

−2x2n

for i = 2, 3, . . . , n − 1,

+ 3xn − xn−1 + 0.5xαn + 1,

and αi is a random integer number in [αimin , αimax ], where αimin = max[1, i − 10] and αimax = min[n, i + 10], for all i. Initial guess: x0 = (0, 0, . . . , 0)t . 15. Function 15 [18, p. 475] f1 (x) = −2x21 + 3x1 + 3xn−4 − xn−3 − xn−2 + 0.5xn−1 − xn + 1,

fi (x) = −2x2i + 3xi − xi−1 − 2xi+1 + 3xn−4 − xn−3 − xn−2 + 0.5xn−1 −xn + 1,

fn (x) =

−2x2n

for i = 2, 3, . . . , n − 1,

+ 3xn − xn−1 + 3xn−4 − xn−3 − xn−2 + 0.5xn−1 − xn + 1.

Initial guess: x0 = (−1, −1, . . . , −1)t .

23

16. Strictly convex function 1 [31, p. 29] n

x1 i=1 (e

F (x) is the gradient of h(x) =

− xi ).

fi (x) = exi − 1,

Initial guess: x0 =

t

1 2 n, n, . . . , 1

for i = 1, 2, . . . , n.

.

17. Strictly convex function 2 [31, p. 30] F (x) is the gradient of h(x) = fi (x) =

n

i i=1 10

(ex1 − xi ).

i xi (e − 1) , 10

for i = 1, 2, . . . , n.

Initial guess: x0 = (1, 1, . . . , 1)t . 18. Function 18 ( n is a multiple of 3) For i = 1, 2, . . . , n/3, f3i−2 (x) = x3i−2 x3i−1 − x23i − 1,

f3i−1 (x) = x3i−2 x3i−1 x3i − x23i−2 + x23i−1 − 2, f3i (x) = e−x3i−2 − e−x3i−1 .

Initial guess: x0 = (0, 0, . . . , 0)t . 19. Zero Jacobian function f1 (x) =

n j=1

x2j ,

fi (x) = −2x1 xi , for i = 2, . . . , n. Initial guess: x10 =

100(n−100) , n

and for all i ≥ 2, xi0 =

(n−1000)(n−500) . (60n)2

20. Geometric programming function fi (x) =

5 t=1

⎡ ⎣0.2 t x0.2t−1 i

n k=1,k=i

⎤ ⎦. x0.2t k

Initial guess: x0 = (1, 1, . . . , 1)t . 21. Function 21 (n is a multiple of 3). For i = 1, 2, . . . , n/3: f3i−2 (x) = x3i−2 x3i−1 − x23i − 1,

f3i−1 (x) = x3i−2 x3i−1 x3i − x23i−2 + x23i−1 − 2, f3i (x) = e−x3i−2 − e−x3i−1 .

Initial guess: x0 = (1, 1, . . . , 1)t . 24

22. Linear function-full rank. fi (x) = xi − (2/n)

n

xj + 1.

j=1

Initial guess: x0 = (100, 100, . . . , 100)t . 23. Linear function-rank 2. f1 (x) = x1 − 1, fi (x) = i

n

jxj − i, for i = 2, 3, . . . , n.

j=1

t

Initial guess: x0 = 1, n1 , . . . , n1 . 24. Penalty I function. fi (x) =

√

n 1 1 x2j − . 4n j=1 4

fn (x) =

Initial guess: x0 =

10−5 (xi − 1), for i = 1, 2, . . . , n − 1,

1 1 1 3, 3, . . . , 3

t

.

25. Brown almost function. fi (x) = xi +

n

xj − (n + 1), for i = 1, 2, . . . , n − 1,

j=1

fn (x) =

n

xj − 1.

j=1

Initial guess: x0 =

n−1 n−1 n−1 n , n ,..., n

t

.

26. Variable dimensioned function. fi (x) = xi − 1, for i = 1, 2, . . . , n − 2, fn−1 (x) =

n−2

j(xj − 1),

j=1

⎛

fn (x) = ⎝

n−2

⎞2

j(xj − 1)⎠ .

j=1

t

Initial guess: x0 = 1 − n1 , 1 − n2 , . . . , 0 .

25

27. Geometric function. 5

fi (x) =

⎛ ⎝(t/5)x(t/5−1) i

t=1

⎞

n k=1,k=i

t/5 xk ⎠ .

Initial guess: x0 = (0.9, 0.9, . . . , 0.9)t . 28. Extended Powel singular function (n is a multiple of 4) [28]. For i = 1, 2, . . . , n/4, f4i−3 (x) = x4i−3 + 10x4i−2 , √ 5(x4i−1 − x4i ), f4i−2 (x) =

f4i−1 (x) = (x4i−2 − 2x4i−1 )2 , √ f4i (x) = 10(x4i−3 − x4i )2 .

t

Initial guess: x0 = 1.5 × 10−4 , 1.5 × 10−4 , . . . , 1.5 × 10−4 . 29. Function 27. f1 (x) =

n j=1

x2j ,

fi (x) = −2x1 xi , for i = 2, 3, . . . , n.

Initial guess: x0 = 100, n12 , . . . , n12

t

.

30. Tridimensional valley function (n is a multiple of 3) [16]. For i = 1, 2, . . . , n/3:

(c2 x33i−2

f3i−2 (x) =

+ c1 x3i−2 ) exp

−x23i−2 100

− 1,

f3i−1 (x) = 10(sin(x3i−2 ) − x3i−1 ), f3i (x) = 10(cos(x3i−2 ) − x3i ), donde c1 = 1.003344481605351 c2 = −3.344481605351171 × 10−3 Initial guess: x0 = (2, 1, 2, 1, 2, . . .)t . 31. Complementary function (n is even). For i = 1, 2, . . . , n/2,

x22i−1

f2i−1 (x) =

+ x2i−1 e

−x2i−1 ex2i−1 +

f2i (x) =

x2i−1

1 − n

2 1/2

1 , n

x22i + (3xi + sin(x2i ) + ex2i )2 x2i

−3x2i − sin(x2i ) − e 26

.

1/2

− x2i−1

− x2i

Initial guess: x0 = (0.5, . . . , 0.5)t . 32. Minimal function. fi (x) =

(ln xi − exp(xi ))2 + 10−10 . 2

(ln xi + exp(xi )) −

Initial guess: x0 = (1, . . . , 1)t . 33. Hanbook function. ⎛

fi (x) = 0.05(xi − 1) + 2 sin ⎝

n

(xj − 1) +

j=1

⎛

(1 + 2(xi − 1)) + 2 sin ⎝

n

n

⎞

(xj − 1)2 ⎠

j=1

⎞

(xj − 1)⎠ .

j=1

Initial guess: x0 = (5, . . . , 5)t . 34. Tridiagonal system [23]. f1 (x) = 4(x1 − x22 ),

fi (x) = 8xi (x2i − xi−1 ) − 2(1 − xi ) + 4(xi − x2i+1 ), for i = 2, . . . , n − 1

fn (x) = 8xn (x2n − xn−1 ) − 2(1 − xn ). Initial guess: x0 = (12, . . . , 12)t . 35. Five-diagonal system [23]. f1 (x) = 4(x1 − x22 ) + x2 − x23 ,

f2 (x) = 8x2 (x22 − x1 ) − 2(1 − x2 ) + 4(x2 − x23 ) + x3 − x24 ,

fi (x) = 8xi (x2i − xi−1 ) − 2(1 − xi ) + 4(xi − x2i+1 ) + x2i−1 − xi−2 +xi+1 − x2i+2 , for i = 3, . . . , n − 2,

fn−1 (x) = 8xn−1 (x2n−1 − xn−2 ) − 2(1 − xn−1 ) + 4(xn−1 − x2n ) +x2n−2 − xn−3 ,

fn (x) = 8xn (x2n − xn−1 ) − 2(1 − xn ) + x2n−1 − xn−2 . Initial guess: x0 = (−2, . . . , −2)t . 36. Seven-diagonal system [23]. f1 (x) = 4(x1 − x22 ) + x2 − x23 + x3 − x24 ,

f2 (x) = 8x2 (x22 − x1 ) − 2(1 − x2 ) + 4(x2 − x23 ) + x21 + x3 − x24 +x4 − x25 ,

f3 (x) = 8x3 (x23 − x2 ) − 2(1 − x3 ) + 4(x3 − x24 ) + x22 − x1 + x4 −x25 + x21 + x5 − x26 ,

fi (x) = 8xi (x2i − xi−1 ) − 2(1 − xi ) + 4(xi − x2i+1 ) + x2i−1 27

−xi−2 + xi+1 − x2i+2 + x2i−2 + xi+2 − xi−3 −x2i+3 , for i = 4, . . . , n − 3,

fn−2 (x) = 8xn−2 (x2n−2 − xn−3 ) − 2(1 − xn−2 ) + 4(xn−2 − x2n−1 ) +x2n−3 − xn−4 + xn−1 − x2n + x2n−4 + xn − xn−5 ,

fn−1 (x) = 8xn−1 (x2n−1 − xn−2 ) − 2(1 − xn−1 ) + 4(xn−1 − x2n ) + x2n−2 −xn−3 + xn + x2n−3 − xn−4 ,

fn (x) = 8xn (x2n − xn−1 ) − 2(1 − xn ) + x2n−1 − xn−2 + x2n−2 − xn−3 . Initial guess: x0 = (−3, . . . , −3)t . 37. Extended Freudenstein and Roth function (n is even) [4]. For i = 1, 2, . . . , n/2, f2i−1 (x) = x2i−1 + ((5 − x2i )x2i − 2)x2i − 13, f2i (x) = x2i−1 + ((x2i + 1))x2i − 14)x2i − 29. Initial guess: x0 = (6, 3, 6, 3, . . . , 6, 3)t . 38. Extended Cragg and Levy problem (n is a multiple of 4) [28]. For i = 1, 2, . . . , n/4, f4i−3 (x) = (exp(x4i−3 ) − x4i−2 )2 , f4i−2 (x) = 10(x4i−2 − x4i−1 )3 , f4i−1 (x) = tan2 (x4i−1 − x4i ), f4i (x) = x4i − 1. Initial guess: x0 = (4, 2, 2, 2, 4, 2, 2, 2, . . .)t . 39. Extended Wood problem (n is a multiple of 4) [20]. For i = 1, 2, . . . , n/4,

f4i−3 (x) = −200x4i−3 x4i−2 − x24i−3 − (1 − x4i−3 ),

f4i−2 (x) = 200 x4i−2 − x24i−3 + 20(x4i−2 − 1) + 19.8(x4i − 1),

f4i−1 (x) = −180x4i−1 x4i − x24i−1 − (1 − x4i−1 ),

f4i (x) = 180 x4i − x24i−1 + 20.2(x4i − 1) + 19.8(x4i−2 − 1). Initial guess: x0 = (0, . . . , 0)t . 40. Tridiagonal exponential problem [4]. f1 (x) = x1 − exp(cos(h(x1 + x2 ))), fi (x) = xi − exp(cos(h(xi−1 + xi + xi+1 ))), for i = 2, . . . , n − 1 fn (x) = xn − exp(cos(h(xn−1 + xn ))), h = 1/(n + 1). Initial guess: x0 = (1.5, . . . , 1.5)t . 28

41. Discrete boundry value problem [28]. f1 (x) = 2x1 + 0.5h2 (x1 + h)3 − x2 , fi (x) = 2xi + 0.5h2 (xi + hi)3 − xi−1 + xi+1 , for i = 2, . . . , n − 1

fn (x) = 2xn + 0.5h2 (xn + hn)3 − xn−1 , h = 1/(n + 1).

Initial guess: x0 = (h(h − 1), h(2h − 1), . . . , h(nh − 1))t . 42. Brent problem [1]. f1 (x) = 3x1 (x2 − 2x1 ) + x22 /4,

fi (x) = 3xi (xi+1 − 2xi + xi−1 ) + (xi+1 − xi−1 )2 /4, for i = 2, . . . , n − 1

fn (x) = 3xn (20 − 2xn + xn−1 ) + (20 − xn−1 )2 /4, h = 1/(n + 1). Initial guess: x0 = (0, . . . , 0, 20, 20) t . 43. Troesch problem [32]. f1 (x) = 2x1 + ρh2 sinh(ρx1 ) − x2 ,

fi (x) = 2xi + ρh2 sinh(ρxi ) − xi−1 − xi+1 , for i = 2, . . . , n − 1

fn (x) = 2xn + ρh2 sinh(ρxn ) − xn−1 , ρ = 10, h = 1/(n + 1). Initial guess: x0 = (0, . . . , 0)t . 44. Trigonometric system [33].

fi (x) = 5 − (l + 1)(1 − cos xi ) − sin xi −

5l+5 j=5l+1

l = div(i − 1, 5).

Initial guess: x0 =

1 1 n, . . . , n

t

.

29

cos xj ,

Appendix B: Comparison of SANE algorithm with NewtonKrylov methods We compare the performance of the SANE algorithm, on a set of large-scale test problems, with some recent implementations of inexact Newton methods. In particular we compare with Inexact Newton with GMRES (ING), Inexact Newton with Bi-CGSTAB (INBC), and Inexact Newton with TFQMR (INT). For all these techniques, an Armijo line search condition is used as a globalization strategy. For ING and INBC the Eisenstat-Walker formula is included [13]. The MATLAB code for all these methods was obtained from Kelley [21]. For all our experiments we use MATLAB 6.1 on a Pentium III personal computer at 700 MHz. In the Appendix A we list the test functions, F (x), and the associated initial guess, x0 , used in our experiments. The parameters used for inexact Newton methods are fully described in [21]. For the SANE algorithm we use γ = 10−4 , ε = 10−8 , σ1 = 0.1, σ2 = 0.5, α0 = 1, M = 10, ⎧ ⎪ if Fk > 1, ⎨ 1 Fk if 10−5 ≤ Fk ≤ 1, δ= ⎪ ⎩ 10−5 if Fk < 10−5 , and the scalar Fkt Jk Fk is computed by Fkt Jk Fk ≈ Fkt

F (xk + hFk ) − Fk , h

with h = 10−7 . For choosing λ at Step 7, we use the following parabolic model (Kelley [21, pp. 142143]). Denoted by λc > 0 the current value of λ, we update λ as follows ⎧ ⎪ ⎨ σ1 λc

λ= where λt =

if λt < σ1 λc , σ2 λc if λt > σ2 λc , ⎪ ⎩ otherwise, λt

−λ2c F (xk )t J(xk )F (xk ) . F (xk + λc dk )2 − F (xk )2 − 2F (xk )t J(xk )F (xk )λc

In all our experiments we stop the process using (48), where ea = 10−6 and er = 10−6 . We claim that the method fails, and use the symbol (*), when some of the following options hold: (a)

the number of iterations is greater than or equal to 500; or

(b)

the number of backtrackings at some line search is greater than or equal to 100.

The SANE algorithm could also fail if, at some iteration, a bad breakdown occurs, i.e., |Fkt Jk Fk | < εFk 2 , which will be reported with the symbol (**). The numerical results are shown in Tables 5, 6, 7, and 8. We report the problem number and the dimension of the problem (Function(n)), the number of iterations (IT), the number of function evaluations (F), the number of backtrackings (BK), and the CPU time in seconds (T). 30

Function(n) 1(1000) 1(5000) 1(10000) 2(500) 2(1000) 2(2000) 3(50) 3(100) 3(200) 4(99) 4(399) 4(999) 5(1000) 5(5000) 5(10000) 6(100) 6(500) 6(1000) 7(9) 7(99) 7(399) 8(1000) 8(5000) 8(10000) 9(2500) 9(5000) 9(10000) 10(5000) 10(10000) 10(15000)

IT 5 4 3 5 4 4 5 4 3 6 6 7 6 6 6 4 4 4 5 5 5 * 8 * 13 13 12 5 5 5

F 42 26 16 51 30 29 144 81 28 68 68 87 51 51 51 27 27 27 65 65 65 * 330 * 249 244 204 35 35 35

BK 0 0 0 0 0 0 0 0 0 2 2 2 0 0 0 0 0 0 11 11 11 * 4 * 0 0 0 0 0 0

T 0.09 0.23 0.38 0.14 0.1 0.16 0.25 0.21 0.05 0.05 0.085 0.27 0.1 0.36 0.7 0.1 3.05 6. 0.03 0.1 0.31 * 5. * 0.75 1.3 2.25 0.27 0.5 0.75

Function(n) 11(500) 11(1000) 11(2000) 12(100) 12(500) 12(1000) 13(100) 13(500) 13(1000) 14(100) 14(500) 14(1000) 15(500) 15(1000) 15(5000) 16(1000) 16(10000) 16(50000) 17(100) 17(500) 17(1000) 18(399) 18(999) 18(9999) 19(100) 19(500) 19(1000) 20(50) 20(100) 20(500)

IT 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 4 4 8 8 8 * * * 5 10 10 14 14 15

F 73 70 73 66 66 64 67 73 73 68 74 73 61 61 61 24 24 24 225 316 338 * * * 50 120 120 224 224 255

BK 0 0 0 1 1 1 1 2 2 1 2 2 0 0 0 0 0 0 0 0 0 * * * 0 0 0 0 0 0

T 0.21 0.23 0.4 0.05 0.21 0.23 0.1 0.33 0.5 0.1 0.35 0.49 0.12 0.17 0.6 0.05 0.38 1.9 0.19 2.2 2.7 * * * 0.05 0.1 0.17 2.343 6.088 92.342

Table 5: Iterations (IT), function evaluations (F), backtrackings (BK), and CPU time (T) for Newton-GMRES.

The results from Tables 5, 6, 7, and 8 are summarized in Table 9. In Table 9 we report the number of problems for which each method is a winner in number of iterations, number of function evaluations, and CPU time. We also report the failure percentage (FP) for each method. In Table 10 we compare the performance (number of problems for which each method is a winner) between the SANE algorithm and Newton-GMRES, the best of the four competitors in function evaluations.

31

Function(n) 1(1000) 1(5000) 1(10000) 2(500) 2(1000) 2(2000) 3(50) 3(100) 3(200) 4(99) 4(399) 4(999) 5(1000) 5(5000) 5(10000) 6(100) 6(500) 6(1000) 7(9) 7(99) 7(399) 8(1000) 8(5000) 8(10000) 9(2500) 9(5000) 9(10000) 10(5000) 10(10000) 10(15000)

IT 5 4 3 4 4 3 9 * 4 8 8 8 7 7 7 5 5 5 5 5 4 * * * 16 14 13 5 5 5

F 56 34 21 40 38 21 1737 * 538 182 261 261 255 255 255 50 50 50 90 90 114 * * * 737 551 436 50 50 50

BK 0 0 0 0 0 0 10 * 15 5 20 20 0 0 0 0 0 0 11 12 2 * * * 5 2 0 0 0 0

T 0.9 0.19 0.28 0.07 0.08 0.08 0.24 * 0.29 0.07 0.19 0.6 0.4 1.6 3. 0.14 4.4 8.5 0.04 0.13 1.05 * * * 1.25 2.25 3.6 0.24 0.46 0.7

Function(n) 11(500) 11(1000) 11(2000) 12(100) 12(500) 12(1000) 13(100) 13(500) 13(1000) 14(100) 14(500) 14(1000) 15(500) 15(1000) 15(5000) 16(1000) 16(10000) 16(50000) 17(100) 17(500) 17(1000) 18(399) 18(999) 18(9999) 19(100) 19(500) 19(1000) 20(50) 20(100) 20(500)

IT 5 5 5 5 5 5 6 6 6 6 6 6 5 5 5 4 4 4 7 8 8 * * * * * * 14 14 15

F 62 62 62 57 61 57 87 85 85 87 85 85 54 54 54 34 34 34 203 444 528 * * * * * * 329 329 375

BK 0 0 0 1 1 1 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 * * * * * * 0 0 0

T 0.13 0.15 0.26 0.04 0.18 0.18 0.1 0.3 0.48 0.1 0.3 0.48 0.08 0.1 0.4 0.005 0.33 1.65 0.07 0.5 0.8 * * * * * * 3.455 9.013 137.7

Table 6: Iterations (IT), function evaluations (F), backtrackings (BK), and CPU time (T) for Newton-Bi-CGSTAB.

32

Function(n) 1(1000) 1(5000) 1(10000) 2(500) 2(1000) 2(2000) 3(50) 3(100) 3(200) 4(99) 4(399) 4(999) 5(1000) 5(5000) 5(10000) 6(100) 6(500) 6(1000) 7(9) 7(99) 7(399) 8(1000) 8(5000) 8(10000) 9(2500) 9(5000) 9(10000) 10(5000) 10(10000) 10(15000)

IT 5 4 4 6 4 4 6 5 2 6 6 7 6 6 6 5 5 5 5 5 5 * * * 14 13 13 5 5 5

F 68 40 40 545 116 116 1713 1220 245 103 101 131 75 75 75 58 58 58 220 220 218 * * * 491 442 384 50 50 50

BK 0 0 0 0 0 0 2 0 0 2 2 2 0 0 0 0 0 0 3 3 2 * * * 4 4 2 0 0 0

T 0.11 0.24 0.47 1.25 0.5 0.9 0.48 0.46 0.23 0.05 0.08 0.29 0.08 0.29 0.55 0.13 4.3 8. 0.12 0.36 1.15 * * * 1.1 2. 3.1 0.18 0.33 0.49

Function(n) 11(500) 11(1000) 11(2000) 12(100) 12(500) 12(1000) 13(100) 13(500) 13(1000) 14(100) 14(500) 14(1000) 15(500) 15(1000) 15(5000) 16(1000) 16(10000) 16(50000) 17(100) 17(500) 17(1000) 18(399) 18(999) 18(9999) 19(100) 19(500) 19(1000) 20(50) 20(100) 20(500)

IT 5 5 5 6 6 6 5 5 5 5 6 5 5 5 5 4 4 4 7 8 8 * * * * 10 10 14 14 15

F 68 66 76 93 85 85 65 70 74 65 99 70 66 66 66 34 34 34 331 752 780 * * * * 175 175 329 329 375

BK 0 0 0 1 1 1 1 2 2 1 2 2 0 0 0 0 0 0 0 0 0 * * * * 0 0 0 0 0

T 0.15 0.18 0.35 0.04 0.19 0.23 0.08 0.3 0.6 0.09 0.35 0.49 0.1 0.13 0.49 0.04 0.24 1.2 0.15 1.4 1.6 * * * * 0.07 0.09 2.354 6.089 92.373

Table 7: Iterations (IT), function evaluations (F), backtrackings (BK), and CPU time (T) for Newton-TFQMR.

33

Function(n) 1(1000) 1(5000) 1(10000) 2(500) 2(1000) 2(2000) 3(50) 3(100) 3(200) 4(99) 4(399) 4(999) 5(1000) 5(5000) 5(10000) 6(100) 6(500) 6(1000) 7(9) 7(99) 7(399) 8(1000) 8(5000) 8(10000) 9(2500) 9(5000) 9(10000) 10(5000) 10(10000) 10(15000)

IT 11 5 5 12 9 7 46 80 47 120 146 128 35 35 35 8 8 8 12 12 12 6 6 6 19 16 17 6 6 6

F 23 11 11 27 21 18 94 163 100 287 358 319 74 74 74 17 17 17 37 37 37 13 13 13 42 36 39 13 13 13

BK 0 0 0 1 1 1 1 2 5 45 59 56 2 2 2 0 0 0 4 4 4 0 0 0 1 1 1 0 0 0

T 0.05 0.09 0.17 0.05 0.05 0.07 0.08 0.15 0.1 0.5 1.1 2.9 0.12 0.43 0.8 0.16 1.55 9. 0.04 0.15 0.48 0.04 0.15 0.28 0.38 0.65 1.25 0.09 0.16 0.24

Function(n) 11(500) 11(1000) 11(2000) 12(100) 12(500) 12(1000) 13(100) 13(500) 13(1000) 14(100) 14(500) 14(1000) 15(500) 15(1000) 15(5000) 16(1000) 16(10000) 16(50000) 17(100) 17(500) 17(1000) 18(399) 18(999) 18(9999) 19(100) 19(500) 19(1000) 20(50) 20(100) 20(500)

IT 20 22 22 12 11 11 15 14 14 15 14 14 16 16 16 6 6 6 59 105 125 116 114 80 ** 13 12 13 8 1

F 42 45 45 26 24 24 33 31 31 33 31 31 34 34 34 13 13 13 133 245 305 281 278 198 ** 28 26 27 17 3

BK 1 0 0 1 1 1 2 2 2 2 2 2 1 1 1 0 0 0 13 24 32 42 44 31 ** 1 1 1 1 1

T 0.1 0.16 0.29 0.04 0.1 0.19 0.13 0.35 0.65 0.13 0.35 0.65 0.06 0.11 0.39 0.02 0.13 0.65 0.12 0.29 0.6 0.38 0.65 4.3 ** 0.04 0.051 2.253 3.856 12.598

Table 8: Iterations (IT), function evaluations (F), backtrackings (BK), and CPU time (T) for the SANE algorithm.

34

Method Newton-GMRES Newton-Bi-CGSTAB Newton-TFQMR SANE

IT 8 6 6 9

F 9 0 0 51

T 8 9 6 35

FP(%) 8.33 16.67 11.67 1.67

Table 9: Number of problems for which each method is a winner, and failure percentage (FP) for each method.

Method Newton-GMRES SANE

IT 51 9

F 9 51

T 19 41

Table 10: SANE algorithm vs. Newton-GMRES.

References [1] G. Alefeld, A. Gienger, and F. Potra (1994) Efficient validation of solutiona of nonlinear systems. SIAM Journal on Numerical Analysis, 31, pp. 252–260. [2] J. Barzilai and J. M. Borwein (1988). Two-point step size gradient methods, IMA Journal of Numerical Analysis, 8, 141-148. [3] S. Bellavia and B. Morini (2001). A globally convergent Newton-GMRES subspace method for systems of nonlinear equations. SIAM Journal on Scientific Computing, 23, pp. 940–960. [4] Y. Bing and G. Lin (1991) An efficient implementation of Merrill’s method for sparse or partially separable systems of nonlinear equations. SIAM Journal on Optimization, 2, pp. 206–221. [5] E. G. Birgin, J. M. Mart´ınez and M. Raydan (2000). Nonmonotone spectral projected gradient methods on convex sets, SIAM Journal on Optimization, 10, pp. 1196–1211. [6] E. G. Birgin, J. M. Mart´ınez and M. Raydan (2001), Algorithm 813: SPG - Software for convex-constrained optimization, ACM Transactions on Mathematical Software, 27, pp. 340–349. [7] E. G. Birgin, J. M. Mart´ınez and M. Raydan (2003). Inexact Spectral Gradient method for convex-constrained optimization, IMA Journal on Numerical Analysis, 8, pp. 141–148. [8] P.N. Brown and Y. Saad (1990). Hybrid Krylov methods for nonlinear systems of equations. SIAM Journal on Scientific Computing, 11, pp. 450–481.

35

[9] P.N. Brown and Y. Saad (1994). Convergence theory of nonlinear Newton-Krylov algorithms. SIAM Journal on Optimization, 4, pp. 297–330. [10] Y. H. Dai and L. Z. Liao (2002). R-linear convergence of the Barzilai and Borwein gradient method, IMA Journal of Numerical Analysis , 22, pp. 1–10. [11] Y.H. Dai (2001). On Nonmonotone Line Search. JOTA, to appear. [12] J. E. Dennis and R. B. Schnabel (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, New Jersey. [13] S. C. Eisenstat and H. F. Walker (1994). Globally convergent inexact Newton methods. SIAM J. Opt., 4, pp. 393-422. [14] R. Fletcher (1990). Low storage methods for unconstrained optimization, Lectures in Applied Mathematics (AMS), 26, pp. 165–179. [15] R. Fletcher (2001). On the Barzilai-Borwein method, Department of Mathematics, University of Dundee NA/207, Dundee, Scotland. [16] A. Friedlander, M.A. Gomes-Ruggiero, D.N. Kozakevich, J.M. Mart´ınez, and S.A. Santos (1997) Solving nonlinear systems of equations by mean of quasi-Newton methods with a nonmonotone strategy. Optimizations Methods and Software, 8, pp. 25–51. [17] M. Gasparo (2000). A nonmonotone hybrid method for nonlinear systems. Optimization Methods & Software, 13, pp. 79–94. [18] M. Gomez-Ruggiero, J. M. Martínez, and A. Moretti (1992). Comparing Algorithms for solving sparse nonlinear systems of equations. SIAM J. Sci. Stat. Comp., 23, pp. 459-483. [19] L. Grippo, F. Lampariello, and S. Lucidi (1986). A nonmonotone line search technique for Newton’s method. SIAM Journal on Numerical Analysis, 23, pp. 707–716. [20] S. Incertin, F. Zirilli, and V. Parisi (1981) A Fortran subroutine for solving systems of nonlinear simultaneous equations. Computer Journal, 24, pp. 87–91. [21] C. T. Kelley (1995). Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia. [22] W. La Cruz and M. Raydan (2003). Nonmonotone Spectral Methods for LargeScale Nonlinear Systems. Optimization Methods and Software, 18, pp. 583–599. [23] G. Li (1989) Successive column correction algorithms for solving sparse nonlinear systems of equations. Mathematical Programming, 43, pp. 187–207. [24] D.H. Li and M. Fukushima (2000). A derivative-free line search and global convergence of Broyden-like method for nonlinear equations. Optimization Methods & Software., 13, pp. 181–201.

36

[25] F. Luengo, M. Raydan, W. Glunt and T. L. Hayden (2002). Preconditioned Spectral Gradient Method, Numerical Algorithms, 30, 241–258. [26] J. M. Mart´ınez (2000). Practical quasi-Newton methods for solving nonlinear systems, Journal of Computational and Applied Mathematics, 124, 97–122. [27] B. Molina and M. Raydan (1996). Preconditioned Barzilai-Borwein method for the numerical solution of partial differential equations, Numerical Algorithms, 13, pp. 45–60. [28] J.J. Moré, B.S. Garbow, and K.E. Hillstr¨ om (1981) Testing uncosntrained optimization software. ACM Transactions on Mathematical Software, 7, 17–41. [29] J. M. Ortega and W. C. Rheinboldt (1970). Iterative solution of nonlinear equations in several variables, Academic Press, New York. [30] M. Raydan (1993). On the Barzilai and Borwein choice of the steplength for ghe gradient method, IMA Journal on Numerical Analysis, 13, 321–326. [31] M. Raydan (1997). The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization., 7, pp. 26– 33. [32] S.M. Roberts and J.J. Shipman (1976) On the closed form solution of Troesch’s problem. Journal of Computational Physics, 21, pp. 291–304. [33] P.L. Toint (1978). Some numerical results using a sparse matrix updating formula in unconstrained optimization. Mathematics of Computation, 143, pp. 839–851.

37

Spectral residual method without gradient information for solving large

Spectral residual method without gradient information for solving large

Suggest Documents

Spectral residual method without gradient information for solving large ...

efficient hybrid conjugate gradient method for solving

Scaling on the Spectral Gradient Method Fahimeh

A boundary element method for solving 3D static gradient elastic ...

A gradient projection method for solving split equality

A Spectral Dai-Yuan-Type Conjugate Gradient Method for ...

Dynamical Systems Gradient method for solving nonlinear equations ...

A Gradient Descent Method for Solving an Inverse ... - Springer Link

Using of Bernstein spectral Galerkin method for solving ... - Springer Link

Multistage Spectral Relaxation Method for Solving the Hyperchaotic ...

Iterative spectral method for solving electrostatic or magnetostatic ...

efficient spectral collocation method for solving multi-term fractional ...

spectral method for solving integral and integro ...

two step method without employing derivatives for solving a ...

A Faster Method for Accurate Spectral Testing without ...

A Conjugate Gradient Method with Global Convergence for Large ...

Scaled Diagonal Gradient-Type Method with Extra Update for Large ...

A practical method for solving large-scale TRS

Partial Spectral Projected Gradient Method with ... - Imecc - Unicamp

Partial Spectral Projected Gradient Method with ... - Imecc - Unicamp

CONDITIONAL GRADIENT METHOD FOR STOCHASTIC ...

A new spectral conjugate gradient method and ... - SAGE Journals

CONDITIONAL GRADIENT METHOD FOR STOCHASTIC

PRECONDITIONED CONJUGATE GRADIENT METHOD FOR