nonmonotone spectral methods for large-scale ...

Optimization Methods and Software Vol. 18, No. 5, October 2003, pp. 583–599

NONMONOTONE SPECTRAL METHODS FOR LARGE-SCALE NONLINEAR SYSTEMS WILLIAM LA CRUZa,∗ and MARCOS RAYDANb,† a Dpto. de Electrónica, Computación y Control, Facultad de Ingenieria, Universidad Central de Venezuela, Caracas, Venezuela; b Dpto. de Computación, Facultad de Ciencias, Universidad Central de Venezuela, Ap. 47002, Caracas 1041-A, Venezuela

(Received 19 August 2002; In final form 23 July 2003) The spectral gradient method has proved to be effective for solving large-scale optimization problems. In this work we extend the spectral approach to solve nonlinear systems of equations. We consider a strategy based on nonmonotone line search techniques to guarantee global convergence, and discuss implementation details for solving large-scale problems. We compare the performance of our new method with recent implementations of inexact Newton schemes based on Krylov subspace inner iterative methods for the linear systems. Our numerical experiments indicate that the spectral approach for solving nonlinear systems competes favorably with well-established numerical methods. Keywords: Spectral gradient method; Nonmonotone line search; Krylov subspace methods

1

INTRODUCTION

Consider the nonlinear system of equations F(x) = 0,

(1)

where F: n → n is a continuously differentiable mapping. We are interested in the largescale case for which the Jacobian of F is either not available or requires a prohibitive amount of storage. Different methods have been developed for solving (1). The most popular schemes are based on Newton’s method, or quasi-Newton methods [8,13–15,21,22,25,26,29,33]. These methods are attractive because they converge rapidly from any sufficiently good initial guess. The main drawback of these methods, for large values of n, is that they need to solve a linear system of equations at each iteration using the Jacobian matrix or an approximation of it. A suitable approach, for large values of n, is to use inexact Newton type methods, that solve inexactly the linear systems by means of iterative solvers [12,16,27,34]. The inexactness comes from the fact that the inner iterative methods are stopped prematurely solving the linear system only ∗ †

E-mail: [email protected] Corresponding author. E-mail: [email protected]

c 2003 Taylor & Francis Ltd ISSN 1055-6788 print; ISSN 1029-4937 online DOI: 10.1080/10556780310001610493

584

WILLIAM LA CRUZ AND MARCOS RAYDAN

approximately, at a low computational cost per iteration. Modern implementations use Krylov subspace iterative solvers [2,6,7,24] (e.g. TFQMR [20], GMRES [32] and Bi-CGSTAB [35]). In this work we propose a different approach for the large-scale case that is not based on a linearization procedure. We are interested in extending the spectral gradient approach to solve (1), using in a systematic way ±F(x) as search directions. Spectral gradient methods are low-cost nonmonotone schemes for finding local minimizers. They were introduced by Barzilai and Borwein [1], the convergence for quadratics was established by Raydan [30], and a global scheme was discussed more recently for nonquadratic functions [31], that uses a variant of the nonmonotone line search of Grippo et al. [23]. It has been applied successfully to find local minimizers of large scale problems [3–5,9,10,28]. A recent review is presented by Fletcher [19]. The rest of the paper is organized as follows. In Section 2 we recall the spectral approach for unconstrained optimization, present the connection that leads to the extension for nonlinear systems of equations, and discuss a suitable nonmonotone line search scheme for the solution of (1). In Section 3 we present the new algorithm and a convergence analysis. In Section 4 we show extensive numerical results on standard test problems, comparing with recent implementations of a variety of inexact Newton methods. Finally, in Section 5 we present some concluding remarks.

2

SPECTRAL APPROACH FOR NONLINEAR SYSTEMS

We begin by considering the unconstrained minimization problem: min f (x),

x∈n

(2)

where f : n → is continuously differentiable and its gradient is available. Most numerical methods for solving (2), solve the nonlinear system ∇ f (x) = 0, using as a merit function, to globalize the process, the objective function f . Therefore, a natural connection for solving (1) is to apply the same techniques but now forcing F(x) = 0 and using f (x) = F(x)22 = F(x)t F(x),

(3)

as a merit function. Among the possible options for the large-scale case, the spectral method has a number of interesting features that make it attractive for the numerical solution of (2). Following the motivation introduced in Ref. [1] for the spectral approach, but now solving (1), the iterations are defined as x k+1 = x k − λk F(x k ), where λk =

t sk−1 sk−1 , t sk−1 yk−1

(4)

sk−1 = x k − x k−1 , and yk−1 = F(x k ) − F(x k−1 ). Obtaining the step length using (4) requires less computational work than a line search, involves the last two iterations, and incorporates

SPECTRAL METHODS FOR NONLINEAR SYSTEMS

585

first order information into the search direction. Indeed, 1 the inverse of (4) is a Rayleigh quotient corresponding to the average Jacobian matrix 0 J (x k−1 + tsk−1 ) dt, where J (x) is the Jacobian of F at the vector x [see Refs. 18,30 for details]. Since the gradient of (3) at x k is given by ∇ f (x k ) = 2 J (x k )t F(x k ),

(5)

then dk = −F(x k ) is not necessarily a descent direction for the function (3) at x k , i.e., ∇ f (x k )t dk = 2F(x k )t J (x k )dk is not necessarily a negative number. Indeed, since in general the symmetric part of J (x k ) is not definite, then F(x k )t J (x k )F(x k ) could be positive, negative or even zero. We can partially overcome this difficulty by using dk = F(x k ) or dk = −F(x k ) whichever is convenient together with a suitable nonmonotone line search that will be discussed in the next paragraph. To the best of our knowledge, this is the first time that dk = ±F(x k ) is proposed as a search direction in a systematic way to solve (1). For unconstrained optimization, one of the most popular nonmonotone line search techniques was introduced by Grippo et al. [23]. It was used in Ref. [31] to globalize the spectral gradient method. More recently, nonmonotone line search techniques have also been proposed for solving (1) associated with Newton and quasi-Newton type methods [17,21,25]. The Grippo-Lampariello-Lucidi (GLL) condition can be written as follows: (see also Dai [11]) f (x k+1 ) ≤ max f (x k− j ) + γ λk ∇ f (x k )t dk , 0≤ j ≤M

where M is a nonnegative integer, and γ is a small positive number. It follows that if ∇ f (x k )t dk < 0, then the GLL condition is satisfied for all λk sufficiently close to zero, and we can compute a step length λk by using a finite backtracking process. However, when ∇ f (x k )t dk = 0, the existence of λk satisfying the GLL condition is not guaranteed. Unfortunately, when dk = ±F(x k ) for solving (1), ∇ f (x k )t dk = ±2F(x k )t J (x k )F(x k ) could be close to zero or zero, and then stagnation or breakdown will occur during the line search process.

3

NEW ALGORITHM

Combining the systematic use of the search direction dk = ±Fk , the spectral choice of step length, and the GLL line search globalization strategy, we obtain the SANE (Spectral Approach for Nonlinear Equations) algorithm. SANE algorithm Let α0 ∈ , M a nonnegative integer, γ > 0, 0 < σ1 < σ2 < 1, 0 < ε < 1, and δ ∈ [ε, 1/ε]. Let x 0 ∈ n be the initial guess and set k = 0. Step 1: If Fk = 0, stop the process. Step 2: If |Fkt Jk Fk |/Fkt Fk < ε, stop the process. Step 3: If αk ≤ ε or αk ≥ 1/ε, then set αk = δ. Step 4: Set sgnk = sgn(Fkt Jk Fk ) and dk = −sgnk Fk . Step 5: Set λ = 1/αk . Step 6: If f (x k + λdk ) ≤ max0≤ j ≤min(k,M) f (x k− j ) + 2γ λFkt Jk dk , go to Step 8. Step 7: Choose σ ∈ [σ1 , σ2 ], set λ = σ λ and go to Step 6. Step 8: Set λk = λ, x k+1 = x k + λk dk , yk = Fk+1 − Fk . Step 9: Set αk+1 = sgnk ((dkt yk )/(λk dkt dk )), k = k + 1, and go to Step 1.

586


Remarks i. The scalar Fkt Jk Fk , that appears frequently in the algorithm, can be approximated by the following well-known formula t t F(x k + h Fk ) − Fk Fk Jk Fk ≈ Fk , (6) h where h is a small positive number. The approximation given by (6) does not require an explicit knowledge of the Jacobian matrix, which is suitable for large-scale problems. Neither it requires the product Jkt Fk , which cannot be approximated without the explicit knowledge of the Jacobian matrix. Hence, in our case, the gradient ∇ f (x k ) given by (5) is not available. ii. At Step 9 the spectral step length is computed using t d k yk . αk+1 = sgnk λk dkt dk The value of sgnk on the right hand side guarantees that αk+1 is a positive number in most cases. Further details will be described in Lemma 3.2. iii. The sequence {x k } generated by algorithm SANE is contained in the closed set 0 = {x: 0 ≤ f (x) ≤ f (x 0 )}.

(7)

We begin our analysis by discussing some of the properties of the SANE algorithm. LEMMA 3.1 The SANE algorithm is well defined. Proof If at iteration k the algorithm does not stop at Step 1 or at Step 2, then dk is a descent direction, and for γ > 0 the following condition f (x k + λdk ) ≤

max

0≤ j ≤min(k,M)

f (x k− j ) + 2γ λFkt Jk dk

(8)

holds by continuity, for λ > 0 sufficiently small. Therefore, the algorithm will not cycle indefinitely between Steps 6 and 7. The following lemma establishes that in most cases αk+1 is positive. LEMMA 3.2 Let αk+1 be computed as in Step 9, then αk+1 > 0 when one of the following cases holds (i) Fkt Fk+1 < 0; (ii) Fkt Fk+1 > 0 and Fk+1 < Fk . Proof Since αk+1 = sgnk ((dkt yk )/(λk dkt dk )) = −(Fkt yk )/(λk Fkt Fk ), then sgn(αk+1 ) = −sgn(Fkt yk ). Suppose that (i) holds, then Fkt yk = Fkt Fk+1 − Fkt Fk < 0 and αk+1 > 0. On the other hand, if (ii) holds, by Cauchy-Schwarz inequality, we obtain 0 < Fkt Fk+1 ≤ Fk Fk+1 < Fk 2 = Fkt Fk , and so Fkt yk = Fkt Fk+1 − Fkt Fk < 0. Hence αk+1 > 0.


587

Notice that the only case in which αk+1 could be negative, and as a consequence we would have to choose δ > 0 in Step 3, is when Fkt Fk+1 > 0 and Fk+1 ≥ Fk , i.e., no descent is observed in the merit function. For the rest of our analysis we need the following assumption. ASSUMPTION A (i) The set 0 given by (7) is bounded. (ii) F(x) is continuously differentiable on 0 . (iii) J (x) is nonsingular for all x ∈ 0 . Before proving the convergence of algorithm SANE we need the following lemma. LEMMA 3.3 Under Assumption A, if {x k } is generated by algorithm SANE, then there exist positive constants c1 , c2 , and c3 such that dk ≤ c1 ∇ f (x k ),

(9)

∇ f (x k ) ≤ c2 dk ,

(10)

Fkt Jk dk ≤ −c3 ∇ f (x k )2 ,

(11)

and

for all k. Proof Let T1 and T2 be positive constants such that J (x)−1 ≤ T1 and J (x) ≤ T2 , for all x ∈ 0 . Since dk = Fk and Fk = (1/2)Jk−t ∇ f (x k ), then 1 −1 J ∇ f (x k ) 2 k T1 ≤ ∇ f (x k ), 2

dk ≤

i.e., c1 = T1 /2 > 0. On the other hand, ∇ f (x k ) = 2 Jkt Fk , and so ∇ f (x k ) ≤ 2Jk Fk = 2Jk dk ≤ 2T2 dk , i.e., c2 = 2T2 > 0. Finally, at every k we have (Step 2) |Fkt Jk Fk | ≥ εFk 2 .

(12)

From Step 4 we obtain dk = −sgn(Fkt Jk Fk )Fk , and so Fkt Jk dk

=

Fkt Jk (−sgn(Fkt Jk Fk ))Fk

=

−Fkt Jk Fk ,

Fkt Jk Fk > 0

Fkt Jk Fk ,

Fkt Jk Fk < 0.

(13)

588


Using (12), (13), dk = Fk , and (10), it follows that Fkt Jk dk ≤ −εFk 2 = ε(−dk 2 ) ≤ ε(−c2−2 ∇ f (x k )2 ) = (−εc2−2 )∇ f (x k )2 , and therefore, c3 = εc2−2 > 0.

Our first theorem establishes that either algorithm SANE terminates prematurely with a good breakdown (Fj = 0) or a bad breakdown (|Fjt Jj Fj | < εFj 2 ), or it converges from any initial guess. THEOREM 3.4 Under Assumption A, algorithm SANE either terminates at a finite iteration j where Fj = 0 or |Fjt Jj Fj | < εFj 2 , or it generates a sequence {x k } such that lim Fk = 0.

k→∞

Proof Let us assume that the algorithm SANE does not terminate, and let x¯ be an accumulation point of {x k }. We make use of the first part of the proof of the convergence theorem in Ref. [23, p. 709]. Clearly, 0 is compact. Let us define m(k) = min(k, M). Clearly, m(0) = 0 and 0 ≤ m(k) ≤ min(m(k − 1) + 1, M)

for k ≥ 1.

Moreover, 0 < λk ≤ max{ε−1 , δ −1 } for all k. Using Lemma 3.3, we can obtain positive numbers c1 and c3 such that dk ≤ c1 ∇ f (x k ) and Fkt Jk dk ≤ −c3 ∇ f (x k )2 . Finally, in Ref. [23] the trial steps are all constant (a > 0). In our algorithm, all trial steps are in the positive closed and bounded interval [min{ε, δ −1 }, max{ε−1 , δ −1 }]. Therefore, repeating the same arguments in Ref. [23, p. 710, 711] we obtain that ¯ = 0. ∇ f (x) ¯ = 2 J (x) ¯ t F(x) ¯ = 0 and the result is established. But J (x) is nonsingular in 0 , therefore, F(x)

Notice that Step 2 in the SANE algorithm stops the process when a bad breakdown occurs. In general, Step 2 has to be taken into account for otherwise the method could converge to a point x¯ at which F(x) ¯ is away from zero but the vector F(x) ¯ is orthogonal to ∇ f (x). ¯ Clearly, this could only happen when the symmetric part of J (x) is indefinite for some vectors x ∈ 0 . Our final result shows the strong global convergence of algorithm SANE, without Step 2, when the symmetric part of J (x) is positive (negative) definite for all x ∈ 0 . COROLLARY 3.5 Under Assumption A, if JS (x) = (J (x) + J (x)t )/2 is positive (negative) definite for all x ∈ 0 , then algorithm SANE, without Step 2, either terminates at a finite iteration j where Fj = 0 or it generates a sequence {x k } such that limk→∞ Fk = 0. Proof Let us assume without any loss of generality that JS (x) is positive definite for all x ∈ 0 . By continuity and compactness there exists µmin > 0 such that µmin ≤ λmin (JS (x k )) for all k, where λmin (JS (x)) represents the smallest eigenvalue of JS (x). Therefore, since Fkt Jk Fk = Fkt (JS (x k ))Fk ≥ λmin (JS (x k ))Fk 2 ≥ µmin Fk 2 > 0, for all k ≥ 0, then using Theorem 3.4, we have that lim k→∞ Fk = 0.


4

589

NUMERICAL RESULTS

We compare the performance of the SANE algorithm, on a set of large-scale test problems, with some recent implementations of inexact Newton methods. In particular we compare with Inexact Newton with GMRES (ING), Inexact Newton with Bi-CGSTAB (INBC), and Inexact Newton with TFQMR (INT). For all these techniques, an Armijo line search condition is used as a globalization strategy. For ING and INBC the Eisenstat–Walker formula is included [16]. The MATLAB code for all these methods was obtained from Kelley [24]. For all our experiments we use MATLAB 6.1 on a Pentium III personal computer at 700 MHz. In the appendix we list the test functions, F(x), and the associated initial guess, x 0 , used in our experiments. The parameters used for inexact Newton methods are fully described in Ref. [24]. For the SANE algorithm we use γ = 10−4 , ε = 10−8 , σ1 = 0.1, σ2 = 0.5, α0 = 1, M = 10,  if Fk > 1,  1 δ=

 

Fk if 10−5 ≤ Fk ≤ 1, 10−5

if Fk < 10−5 ,

and the scalar Fkt Jk Fk is computed by (6), with h = 10−7 . TABLE I

Iterations (IT), function evaluations (F), backtrackings (BK), and CPU time (T) for Newton-GMRES

Function(n)

IT

F

BK

T

1(1000) 1(5000) 1(10,000)

5 4 3

42 26 16

0 0 0

0.09 0.23 0.38

2(500) 2(1000) 2(2000)

5 4 4

51 30 29

0 0 0

3(50) 3(100) 3(200)

5 4 3

144 81 28

4(99) 4(399) 4(999)

6 6 7

5(1000) 5(5000) 5(10,000)

Function(n)

IT

F

BK

T

11(500) 11(1000) 11(2000)

6 6 6

73 70 73

0 0 0

0.21 0.23 0.4

0.14 0.1 0.16

12(100) 12(500) 12(1000)

6 6 6

66 66 64

1 1 1

0.05 0.21 0.23

0 0 0

0.25 0.21 0.05

13(100) 13(500) 13(1000)

6 6 6

67 73 73

1 2 2

0.1 0.33 0.5

68 68 87

2 2 2

0.05 0.085 0.27

14(100) 14(500) 14(1000)

6 6 6

68 74 73

1 2 2

0.1 0.35 0.49

6 6 6

51 51 51

0 0 0

0.1 0.36 0.7

15(500) 15(1000) 15(5000)

6 6 6

61 61 61

0 0 0

0.12 0.17 0.6

6(100) 6(500) 6(1000)

4 4 4

27 27 27

0 0 0

0.1 3.05 6.00

16(1000) 16(10,000) 16(50,000)

4 4 4

24 24 24

0 0 0

0.05 0.38 1.9

7(9) 7(99) 7(399)

5 5 5

65 65 65

11 11 11

0.03 0.1 0.31

17(100) 17(500) 17(1000)

8 8 8

225 316 338

0 0 0

0.19 2.2 2.7

8(1000) 8(5000) 8(10,000)

* 8 *

* 330 *

* 4 *

* 5.00 *

18(399) 18(999) 18(9999)

* * *

* * *

* * *

* * *

9(2500) 9(5000) 9(10,000)

13 13 12

249 244 204

0 0 0

0.75 1.3 2.25

19(100) 19(500) 19(1000)

5 10 10

50 120 120

0 0 0

0.05 0.1 0.17

10(5000) 10(10,000) 10(15,000)

5 5 5

35 35 35

0 0 0

0.27 0.5 0.75

20(50) 20(100) 20(500)

14 14 15

224 224 255

0 0 0

2.343 6.088 92.342

590


For choosing λ at Step 7, we use the following parabolic model (Kelley [24, pp. 142, 143]). Denoted by λc > 0 the current value of λ, we update λ as follows   σ1 λc if λt < σ1 λc , λ = σ2 λc if λt > σ2 λc ,  otherwise, λt where λt =

−λ2c F(x k )t J (x k )F(x k ) . F(x k + λc dk )2 − F(x k )2 − 2F(x k )t J (x k )F(x k )λc

In all our experiments we stop the process when F(x 0 ) F(x k ) √ ≤ ea + er √ , n n

(14)

where ea = 10−6 and er = 10−6 . We claim that the method fails, and use the symbol (∗), when some of the following options hold: (a) the number of iterations is greater than or equal to 500; or (b) the number of backtrackings at some line search is greater than or equal to 100. TABLE II

Iterations (IT), function evaluations (F), backtrackings (BK), and CPU time (T) for Newton-Bi-CGSTAB

Function(n)

IT

F

BK

T

1(1000) 1(5000) 1(10,000)

5 4 3

56 34 21

0 0 0

0.9 0.19 0.28

2(500) 2(1000) 2(2000)

4 4 3

40 38 21

0 0 0

3(50) 3(100) 3(200)

9 * 4

1737 * 538

4(99) 4(399) 4(999)

8 8 8

5(1000) 5(5000) 5(10,000)

Function(n)

IT

F

BK

T

11(500) 11(1000) 11(2000)

5 5 5

62 62 62

0 0 0

0.13 0.15 0.26

0.07 0.08 0.08

12(100) 12(500) 12(1000)

5 5 5

57 61 57

1 1 1

0.04 0.18 0.18

10 * 15

0.24 * 0.29

13(100) 13(500) 13(1000)

6 6 6

87 85 85

2 2 2

0.1 0.3 0.48

182 261 261

5 20 20

0.07 0.19 0.6

14(100) 14(500) 14(1000)

6 6 6

87 85 85

2 2 2

0.1 0.3 0.48

7 7 7

255 255 255

0 0 0

0.4 1.6 3.0

15(500) 15(1000) 15(5000)

5 5 5

54 54 54

0 0 0

0.08 0.1 0.4

6(100) 6(500) 6(1000)

5 5 5

50 50 50

0 0 0

0.14 4.4 8.5

16(1000) 16(10,000) 16(50 000)

4 4 4

34 34 34

0 0 0

0.005 0.33 1.65

7(9) 7(99) 7(399)

5 5 4

90 90 114

11 12 2

0.04 0.13 1.05

17(100) 17(500) 17(1000)

7 8 8

203 444 528

0 0 0

0.07 0.5 0.8

8(1000) 8(5000) 8(10,000)

* * *

* * *

* * *

* * *

18(399) 18(999) 18(9999)

* * *

* * *

* * *

* * *

9(2500) 9(5000) 9(10,000)

16 14 13

737 551 436

5 2 0

1.25 2.25 3.6

19(100) 19(500) 19(1000)

* * *

* * *

* * *

* * *

10(5000) 10(10,000) 10(15 000)

5 5 5

50 50 50

0 0 0

0.24 0.46 0.7

20(50) 20(100) 20(500)

14 14 15

329 329 375

0 0 0

3.455 9.013 137.7


591

The SANE algorithm could also fail if, at some iteration, a bad breakdown occurs, i.e., |Fkt Jk Fk | < εFk 2 , which will be reported with the symbol (∗∗). The numerical results are shown in Tables I–IV. We report the problem number and the dimension of the problem (Function(n)), the number of iterations (IT), the number of function evaluations (F), the number of backtrackings (BK), and the CPU time in seconds (T). The results from Tables I–IV are summarized in Table V. In Table V we report the number of problems for which each method is a winner in number of iterations, number of function evaluations, and CPU time. We also report the failure percentage (FP) for each method. In Table VI we compare the performance (number of problems for which each method is a winner) between the SANE algorithm and Newton-GMRES, the best of the four competitors in function evaluations. Finally, in Table VII we also report the behavior of the SANE algorithm for Function 12 (trigexp) with some different initial points, to illustrate the global convergence. For this problem, the solution vector is x ∗ = (1, . . . , 1)t . The chosen initial points (Init) are x 0 = (0, . . . , 0)t , x 1 = (1/n, 2/n, . . . , 1)t , x 2 = (1 − 1/n, 1 − 2/n, . . . , 0)t , x 3 = (−1, . . . , −1)t , x 4 = (10, . . . , 10)t , x 5 = (100, . . . , 100)t , x 6 = −x 4 y x 7 = −x 5 . For this particular experiment (Tab. VII) we stop the process when F(x k ) ≤ 10−7 , which is stronger than (14).

TABLE III

Iterations (IT), function evaluations (F), backtrackings (BK), and CPU time (T) for Newton-TFQMR

Function(n)

IT

1(1000) 1(5000) 1(10,000)

5 4 4

2(500) 2(1000) 2(2000)

F

BK

T

68 40 40

0 0 0

0.11 0.24 0.47

6 4 4

545 116 116

0 0 0

3(50) 3(100) 3(200)

6 5 2

1713 1220 245

4(99) 4(399) 4(999)

6 6 7

5(1000) 5(5000) 5(10,000)

Function(n)

IT

F

BK

T

11(500) 11(1000) 11(2000)

5 5 5

68 66 76

0 0 0

0.15 0.18 0.35

1.25 0.5 0.9

12(100) 12(500) 12(1000)

6 6 6

93 85 85

1 1 1

0.04 0.19 0.23

2 0 0

0.48 0.46 0.23

13(100) 13(500) 13(1000)

5 5 5

65 70 74

1 2 2

0.08 0.3 0.6

103 101 131

2 2 2

0.05 0.08 0.29

14(100) 14(500) 14(1000)

5 6 5

65 99 70

1 2 2

0.09 0.35 0.49

6 6 6

75 75 75

0 0 0

0.08 0.29 0.55

15(500) 15(1000) 15(5000)

5 5 5

66 66 66

0 0 0

0.1 0.13 0.49

6(100) 6(500) 6(1000)

5 5 5

58 58 58

0 0 0

0.13 4.3 8.0

16(1000) 16(10,000) 16(50 000)

4 4 4

34 34 34

0 0 0

0.04 0.24 1.2

7(9) 7(99) 7(399)

5 5 5

220 220 218

3 3 2

0.12 0.36 1.15

17(100) 17(500) 17(1000)

7 8 8

331 752 780

0 0 0

0.15 1.4 1.6

8(1000) 8(5000) 8(10,000)

* * *

* * *

* * *

* * *

18(399) 18(999) 18(9999)

* * *

* * *

* * *

* * *

9(2500) 9(5000) 9(10,000)

14 13 13

491 442 384

4 4 2

1.1 2.0 3.1

19(100) 19(500) 19(1000)

* 10 10

* 175 175

* 0 0

* 0.07 0.09

10(5000) 10(10,000) 10(15 000)

5 5 5

50 50 50

0 0 0

0.18 0.33 0.49

20(50) 20(100) 20(500)

14 14 15

329 329 375

0 0 0

2.354 6.089 92.373

592


TABLE IV Iterations (IT), function evaluations (F), backtrackings (BK), and CPU time (T) for the SANE algorithm Function(n)

IT

F

BK

1(1000) 1(5000) 1(10,000)

11 5 5

23 11 11

0 0 0

2(500) 2(1000) 2(2000)

12 9 7

27 21 18

3(50) 3(100) 3(200)

46 80 47

4(99) 4(399) 4(999)

Function(n)

IT

F

0.05 0.09 0.17

11(500) 11(1000) 11(2000)

20 22 22

42 45 45

1 0 0

0.1 0.16 0.29

1 1 1

0.05 0.05 0.07

12(100) 12(500) 12(1000)

12 11 11

26 24 24

1 1 1

0.04 0.1 0.19

94 163 100

1 2 5

0.08 0.15 0.1

13(100) 13(500) 13(1000)

15 14 14

33 31 31

2 2 2

0.13 0.35 0.65

120 146 128

287 358 319

45 59 56

0.5 1.1 2.9

14(100) 14(500) 14(1000)

15 14 14

33 31 31

2 2 2

0.13 0.35 0.65

35 35 35

74 74 74

2 2 2

0.12 0.43 0.8

15(500) 15(1000) 15(5000)

16 16 16

34 34 34

1 1 1

0.06 0.11 0.39

6(100) 6(500) 6(1000)

8 8 8

17 17 17

0 0 0

0.16 1.55 9.0

16(1000) 16(10,000) 16(50 000)

6 6 6

13 13 13

0 0 0

0.02 0.13 0.65

7(9) 7(99) 7(399)

12 12 12

37 37 37

4 4 4

0.04 0.15 0.48

17(100) 17(500) 17(1000)

59 105 125

133 245 305

13 24 32

0.12 0.29 0.6

8(1000) 8(5000) 8(10,000)

6 6 6

13 13 13

0 0 0

0.04 0.15 0.28

18(399) 18(999) 18(9999)

116 114 80

281 278 198

42 44 31

0.38 0.65 4.3

9(2500) 9(5000) 9(10,000)

19 16 17

42 36 39

1 1 1

0.38 0.65 1.25

19(100) 19(500) 19(1000)

** 13 12

** 28 26

** 1 1

** 0.04 0.051

10(5000) 10(10,000) 10(15 000)

6 6 6

13 13 13

0 0 0

0.09 0.16 0.24

20(50) 20(100) 20(500)

13 8 1

27 17 3

1 1 1

2.253 3.856 12.598

5(1000) 5(5000) 5(10,000)

T

TABLE V Number of problems for which each method is a winner, and failure percentage (FP) for each method Method

IT

F

T

FP(%)

Newton-GMRES Newton-Bi-CGSTAB Newton-TFQMR SANE

8 6 6 9

9 0 0 51

8 9 6 35

8.33 16.67 11.67 1.67

TABLE VI

SANE algorithm vs. Newton-GMRES

Method

IT

F

T

Newton-GMRES SANE

51 9

9 51

19 41

BK

T


593

TABLE VII SANE algorithm for function 12 (Trigexp) for different initial points n

Init

IT

F

BK

T

x0

100 100 100 100 100 100 100 100

x1 x2 x3 x4 x5 x6 x7

19 18 18 21 30 53 40 74

39 37 37 43 63 115 87 192

1 1 1 1 1 5 5 26

0.080 0.081 0.080 0.090 0.130 0.250 0.171 0.400

1000 1000 1000 1000 1000 1000 1000 1000

x0 x1 x2 x3 x4 x5 x6 x7

19 16 18 22 30 56 37 48

39 33 37 45 63 122 79 108

1 1 1 1 1 6 2 6

0.241 0.220 0.240 0.281 0.390 0.761 0.501 0.701

10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000

x0 x1 x2 x3 x4 x5 x6 x7

20 14 18 20 30 49 40 63

41 29 37 41 63 104 86 150

1 1 1 1 1 2 3 14

3.485 2.464 3.124 3.485 5.407 8.872 7.211 12.818

We observe, from Tables V and VI, that the SANE algorithm is a robust option to solve nonlinear systems of equations. We also observe from Table VII the global behavior of the SANE algorithm for a typical problem, although it requires more iterations when the initial guess is further away from the solution. It is very competitive for large-scale problems. Indeed, the SANE algorithm outperforms all the competitors in function evaluations and CPU time. In some of those cases (Functions 8, 18 and 20) the difference is remarkable. On the other hand, and due to its simplicity, it requires in general many more iterations than all the other methods. However, the SANE iterations are very inexpensive and that explains the advantages in CPU time.

5

FINAL REMARKS

The SANE algorithm, proposed in this work, combines in a suitable way the direction ±F(x k ), with the spectral step length, and a nonmonotone globalization technique to produce a robust scheme to solve nonlinear systems of equations. The simplicity of the search directions accounts for the low computational cost per iteration, and the spectral choice of step length together with the globalization strategy are responsible for the effectiveness of the method. At every iteration it only requires two function evaluations and 4 inner products. Our preliminary numerical results indicate that the SANE algorithm is competitive and many times preferable to recent and well-known implementations of Newton–Krylov methods. The scalar |αk | is closely related to the condition number of Jk [19]. As a consequence, for ill-conditioned problems it could become extremely large or extremely small at some iterations, producing inaccurate calculations that, in turn, represent additional and unnecessary iterations

594


to converge. As an illustrative example of this behavior, consider the results obtained for function 4, for which the Jacobian matrix is almost singular at the solution. We would like to comment on the choice of the parameter M in the SANE algorithm. We have tested the same set of functions with different values of M, ranging from 5 to 15. In general, we observed similar results to the ones presented in our numerical results section, except for problems with a singular or very ill-conditioned Jacobian at the solution. For these problems, the behavior of the method is very sensitive to the choice of M. For example, using M = 8 in Function 4 with n = 999, convergence is obtained after 109 iterations, 40 backtrackings, 264 function evaluations and 1.61 seconds of execution time. These results represent a significant improvement over the ones reported with M = 10. If we know in advance that the symmetric part of J (x) is either positive definite (PD) or negative definite (ND) for all x ∈ 0 , then Step 2 of the algorithm and the computation of sgnk can be avoided. To be precise, sgnk = 1 or −1 for all k if the symmetric part of J (x) is PD or ND respectively. Moreover, the line search at Step 6 can be replaced by the following combination of the GLL scheme with a recent proposition by Li and Fukushima [25] Fk+1 ≤

max

0≤ j ≤min(k,M)

[Fk− j ] − γ λ2 Fk 2 .

In this case, a drastic reduction in CPU time will be observed, since the method requires only one function evaluation and 3 inner products per iteration. This interesting line search technique deserves further practical and theoretical investigation. In general, the line search used in the SANE algorithm has proved to be numerically effective. Nevertheless, there exists the theoretical possibility that the method converges to a point x¯ at which F(x) ¯ is away from zero but the vector F(x) ¯ is orthogonal to ∇ f (x), ¯ which represents a failure of the method.

Acknowledgments The authors wish to thank José Mario Martinez for insightful discussions on this research. We are also indebted to an anonymous referee and Prof. Luigi Grippo, the Associate Editor, for valuable comments and suggestions. This work was partially supported by UCV-PROJECT 97-003769.

References [1] J. Barzilai and J.M. Borwein (1988). Two point step size gradient methods. IMA J. Numer. Anal., 8, 141–148. [2] S. Bellavia and B. Morini (2001). A globally convergent Newton-GMRES subspace method for systems of nonlinear equations. SIAM J. Sci. Comput., 23, 940–960. [3] E.G. Birgin, R. Biloti, M. Tygel and L.T. Santos (1999). Restricted optimization: A clue to fast and accurate implmentation of the common reflection surface method. Journal of Applied Geophysics, 42, 143–155. [4] E.G. Birgin, I. Chambouleyron and J.M. Martínez (1999). Estimation of the optical constants and the thickness of thin films using unconstrained optimization. Journal of Computational Physics, 151, 862–880. [5] E.G. Birgin and Y.G. Evtushenko (1998). Automatic differentiation and spectral projected gradient methods for optimal control problems. Optimization Meth. & Soft., 10, 125–146. [6] P.N. Brown andY. Saad (1990). Hybrid Krylov methods for nonlinear systems of equations. SIAM J. Sci. Comput., 11, 450–481. [7] P.N. Brown and Y. Saad (1994). Convergence theory of nonlinear Newton-Krylov algorithms. SIAM J. Opt., 4, 297–330. [8] C.G. Broyden (1965). A class of methods for solving nonlinear simultaneous equations. Math. Comp., 19, 577–593. [9] Z. Castillo, D. Cores and M. Raydan (2000). Low cost optimization techniques for solving the nonlinear seismic reflection tomography problem. Optimization and Engineering, 1, 155–169.


595

[10] D. Cores, G. Fung and R. Michelena (2000). A fast and global two point low storage optimization technique for tracing rays in 2D and 3D isotropic media. Journal of Applied Geophysics, 45, 273–287. [11] Y.H. Dai (2001). On nonmonotone line search. JOTA (to appear). [12] R.S. Dembo, S.C. Eisenstat and T. Steihaug (1982). Inexact Newton methods. SIAM J. Numer. Anal., 19, 400–408. [13] J.E. Dennis, Jr., J.M. Martínez and X. Zhang (1994). Triangular decomposition methods for solving reducible nonlinear systems of equations. SIAM J. Opt., 4, 358–382. [14] J.E. Dennis, Jr. and H.F. Walker (1981). Convergence theorems for least-change secant update methods. SIAM J. Numer. Anal., 18, 949–987. [15] J.E. Dennis, Jr. and R.B. Schnabel (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs, NJ. [16] S.C. Eisenstat and H.F. Walker (1994). Globally convergent inexact Newton methods. SIAM J. Opt., 4, 393–422. [17] M.C. Ferris and S. Lucidi (1994). Nonmonotone stabilization methods for nonlinear equations JOTA, 81, 53–71. [18] R. Fletcher (1990). Low storage methods for unconstrained optimization. Lectures in Applied Mathematics (AMS), 26, 165–179. [19] R. Fletcher (2001). On the Barzilai–Borwein method. Technical Report NA/207, Department of Mathematics, University of Dundee, Dundee, Scotland. [20] R.W. Freund (1993). A transpose-free quasi-minimal residual algorithm for non-Hermitian linear systems. SIAM J. Sci. Comp., 14, 470–482. [21] M. Gasparo (2000). A nonmonotone hybrid method for nonlinear systems. Optimization Meth. & Soft., 13, 79–94. [22] M. Gomez-Ruggiero, J.M. Martínez and A. Moretti (1992). Comparing algorithms for solving sparse nonlinear systems of equations. SIAM J. Sci. Comp., 23, 459–483. [23] L. Grippo, F. Lampariello and S. Lucidi (1986). A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal., 23, 707–716. [24] C.T. Kelley (1995). Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia. [25] D.H. Li and M. Fukushima (2000). A derivative-free line search and global convergence of Broyden-like method for nonlinear equations. Optimization Meth. & Soft., 13, 181–201. [26] J.M. Martínez (1990). A family of quasi-Newton methods for nonlinear equations with direct secant updates of matrix factorizations. SIAM J. Numer. Anal., 27, 1034–1049. [27] J.M. Martínez (1990). Local convergence theory of inexact Newton methods based on structured least change secant updates. Math. Comp., 55, 143–167. [28] M. Mulato, I. Chambouleyron, E.G. Birgin and J.M. Martínez (2000). Determination of thickness and optical constants of a-Si:H films from transmittance data. Applied Physics Letters, 77, 2133–2135. [29] J.M. Ortega and W.C. Rheinboldt (1970). Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York. [30] M. Raydan (1993). On the Barzilai and Borwein choice of step length for the gradient method. IMA J. Numer. Anal., 13, 321–326. [31] M. Raydan (1997). The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Opt., 7, 26–33. [32] Y. Saad and M.H. Schultz (1986). GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7, 856–869. [33] L.K. Schubert (1970). Modification of a quasi-Newton method for nonlinear equations with a sparse Jacobian. Math. Comp., 24, 27–30. [34] A.H. Sherman (1978). On Newton-iterative methods for the solution of systems of nonlinear equations. SIAM J. Numer. Anal., 15, 755–771. [35] H.A. Van Der Vorst (1992). Bi-CGSTAB: A fast and smoothly convergent variant Bi-CG for the solution of non-symmetric linear systems. SIAM J. Sci. Stat. Comput., 13, 631–644.

APPENDIX: TEST FUNCTIONS We now list the test functions and the associated initial guess. 1. Exponential function 1: F1 (x) = (g1 (x), . . . , gn (x))t , where g1 (x) = e x1 −1 − 1, gi (x) = i (e xi −1 − x i ), for i = 2, 3, . . . , n, t n n n x0 = , ,..., . n−1 n−1 n−1

596


2. Exponential function 2: F2 (x) = (g1 (x), . . . , gn (x))t , where g1 (x) = e x1 − 1, i xi (e + x i−1 − 1), for i = 2, 3, . . . , n, 10 1 1 1 t x0 = , ,..., 2 . n2 n2 n

gi (x) =

3. Exponential function 3: F3 (x) = (g1 (x), . . . , gn (x))t , where i 2 (1 − x i2 − e−xi ), for i = 2, 3, . . . , n − 1, 10 n 2 (1 − e−xn ), gn (x) = 10 1 2 n t , ,..., 2 . x0 = 4n 2 4n 2 4n gi (x) =

4. Diagonal function premultiplied by a quasi-orthogonal matrix (n is a multiple of 3) [21, pp. 89, 90]: F4 (x) = (g1 (x), . . . , gn (x))t where, for i = 1, 2, . . . , n/3, 3 2 − 7.2x 3i−1 + 9.6x 3i−1 − 4.8, g3i−2 (x) = 0.6x 3i−2 + 1.6x 3i−2 3 2 g3i−1 (x) = 0.48x 3i−2 − 0.72x 3i−1 + 3.24x 3i−1 − 4.32x 3i−1 − x 3i + 0.2x 3i3 + 2.16,

g3i (x) = 1.25x 3i − 0.25x 3i3 , t 1 1 x 0 = −1, , −1, . . . , −1, , −1 . 2 2 5. Extended Rosenbrock function (n is even) [21, p. 89]: F5 (x) = (g1 (x), . . . , gn (x))t where, for i = 1, 2, . . . , n/2, 2 ), g2i−1 (x) = 10(x 2i − x 2i−1

g2i (x) = 1 − x 2i−1 , x 0 = (5, 1, . . . , 5, 1)t . 6. Chandrasekhar’s H-equation [24, p. 198]: −1 c 1 µH (ν) dµ F6 (H )(µ) = H (µ) − 1 − = 0. 2 0 µ+ν The discretized version is: F6 (x) = (g1 (x), . . . , gn (x))t , where 

−1 n µ x c i j  , gi (x) = x i − 1 − 2n j =1 µi + µ j

for i = 1, 2, . . . , n,

x 0 = (1, 1, . . . , 1)t , with c ∈ [0, 1) and µi = (i − 1/2)/n, for 1 ≤ i ≤ n. (In our experiments we take c = 0.9.)


597

7. Badly scaled augmented Powell’s function (n is a multiple of 3) [21, p. 89]: F7 (x) = (g1 (x), . . . , gn (x))t where, for i = 1, 2, . . . , n/3, g3i−2 (x) = 104 x 3i−1 x 3i−1 − 1, g3i−1 (x) = exp(−x 3i−2 ) + exp(−x 3i−1 ) − 1.0001, g3i (x) = φ(x 3i ), x 0 = (10−3 , 18, 1, 10−3 , 18, 1, . . .)t , with

 t ≤1  0.5t − 2, 3 2 φ(t) = (−592t + 888t + 4551t − 1924)/1998, −1 < t < 2  0.5t + 2, t ≥ 2.

8. Trigonometric function: F8 (x) = (g1 (x), . . . , gn (x))t where, for i = 1, . . . , n,  gi (x) = 2 n + i (1 − cos x i ) − sin x i − x0 =

101 101 ,..., 100n 100n

n

 cos x j  (2 sin x i − cos x i ) ,

j =1

t .

9. Singular function: F9 (x) = (g1 (x), . . . , gn (x))t where 1 3 1 2 x + x 3 1 2 2 1 i 1 2 gi (x) = − x i2 + x i3 + x i+1 , 2 3 2 1 n gn (x) = − x n2 + x n3 , 2 3 x 0 = (1, 1, . . . , 1)t . g1 (x) =

for i = 2, 3, . . . , n − 1,

10. Logarithmic function: F10 (x) = (g1 (x), . . . , gn (x))t where xi , n x 0 = (1, 1, . . . , 1)t .

gi (x) = ln(x i + 1) −

for i = 1, 2, . . . , n,

11. Broyden Tridiagonal function [22, pp. 471, 472]: F11 (x) = (g1 (x), . . . , gn (x))t where g1 (x) = (3 − 0.5x 1 )x 1 − 2x 2 + 1, gi (x) = (3 − 0.5x i )x i − x i−1 − 2x i+1 + 1, gn (x) = (3 − 0.5x n )x n − x n−1 + 1, x 0 = (−1, −1, . . . , −1)t .

for i = 2, 3, . . . , n − 1,

598


12. Trigexp function [22, p. 473]: F12 (x) = (g1 (x), . . . , gn (x))t where g1 (x) = 3x 13 + 2x 2 − 5 + sin(x 1 − x 2 ) sin(x 1 + x 2 ), gi (x) = −x i−1 e(xi−1 −xi ) + x i (4 + 3x i2 ) + 2x i+1 + sin(x i − x i+1 ) sin(x i + x i+1 ) − 8, gn (x) = −x n−1 e

(xn−1 −xn )

for i = 2, 3, . . . , n − 1,

+ 4x n − 3,

x 0 = (0, 0, . . . , 0)t . 13. Variable band function 1 [22, p. 474]: F13 (x) = (g1 (x), . . . , gn (x))t where g1 (x) = −2x 12 + 3x 1 − 2x 2 + 0.5x α1 + 1, gi (x) = −2x 12 + 3x i − x i−1 − 2x i+1 + 0.5x αi + 1,

for i = 2, 3, . . . , n − 1,

gn (x) = −2x n2 + 3x n − x n−1 + 0.5x αn + 1, x 0 = (0, 0, . . . , 0)t , and αi is a random integer number in [αimin , αimax ], where αimin = max[1, i − 2] and αimax = min[n, i + 2],

for all i.

14. Variable band function 2 [22, p. 474]: F14 (x) = (g1 (x), . . . , gn (x))t where g1 (x) = −2x 12 + 3x 1 − 2x 2 + 0.5x α1 + 1, gi (x) = −2x 12 + 3x i − x i−1 − 2x i+1 + 0.5x αi + 1, gn (x) =

−2x n2

for i = 2, 3, . . . , n − 1,

+ 3x n − x n−1 + 0.5x αn + 1,

x 0 = (0, 0, . . . , 0)t , and αi is a random integer number in [αimin , αimax ], where αimin = max[1, i − 10] and αimax = min[n, i + 10],

for all i.

15. Function 15 [22, p. 475]: F15 (x) = (g1 (x), . . . , gn (x))t where g1 (x) = −2x 12 + 3x 1 + 3x n−4 − x n−3 − x n−2 + 0.5x n−1 − x n + 1, gi (x) = −2x i2 + 3x i − x i−1 − 2x i+1 + 3x n−4 − x n−3 − x n−2 + 0.5x n−1 − x n + 1, gn (x) =

−2x n2

for i = 2, 3, . . . , n − 1,

+ 3x n − x n−1 + 3x n−4 − x n−3 − x n−2 + 0.5x n−1 − x n + 1,

x 0 = (−1, −1, . . . , −1)t . 16. Strictly convex function 1 [31, p. 29]: F16 (x) = (g1 (x), . . . , gn (x))t is the gradient of h(x) = ni=1 (e x1 − x i ), gi (x) = e xi − 1, for i = 1, 2, . . . , n, t 1 2 x0 = , ,...,1 . n n


599

17. Strictly convex function 2 [31, p. 30]: F17 (x) = (g1 (x), . . . , gn (x))t is the gradient of h(x) = ni=1 (i /10)(e x1 − x i ), i xi (e − 1), for i = 1, 2, . . . , n, 10 x 0 = (1, 1, . . . , 1)t .

gi (x) =

18. Function 18 (n is a multiple of 3): F18 (x) = (g1 (x), . . . , gn (x))t where, for i = 1, 2, . . . , n/3: g3i−2 (x) = x 3i−2 x 3i−1 − x 3i2 − 1, 2 2 g3i−1 (x) = x 3i−2 x 3i−1 x 3i − x 3i−2 + x 3i−1 − 2,

g3i (x) = e−x3i−2 − e−x3i−1 , x 0 = (0, 0, . . . , 0)t . 19. Zero Jacobian function: F19 (x) = (g1 (x), . . . , gn (x))t , where g1 (x) =

n

x j2 ,

j =1

gi (x) = −2x 1 x i , for i = 2, . . . , n, x 01 = 100(n − 100)/n, and for all i ≥ 2, x 0i = (n − 1000)(n − 500)/(60n)2 . 20. Geometric programming function: F20 (x) = (g1 (x), . . . , gn (x))t where, for i = 1, . . . , n:   5 n 0.2t x i0.2t−1 gi (x) = x k0.2t  , k=1,k =i

t=1

x 0 = (1, 1, . . . , 1) . t

nonmonotone spectral methods for large-scale ...

nonmonotone spectral methods for large-scale ...

Suggest Documents

Nonmonotone Trust Region Methods for Nonlinear Equality ...

Two globally convergent nonmonotone trust-region methods for ...

Spectral Methods for Numerical Relativity

SPECTRAL METHODS FOR TEMPERED FRACTIONAL ...

ATOMIC SPECTRAL METHODS FOR MOLECULAR

Multipoint secant and interpolation methods with nonmonotone line

Nonmonotone Adaptive Barzilai-Borwein Gradient Algorithm for ...

A NONMONOTONE PROXIMAL BUNDLE

A LargeScale Gene-Trap Screen for Insertional

Parallel Spectral Numerical Methods

Fixed Point Algorithm for Solving Nonmonotone Variational ...

Nonmonotone Barzilai-Borwein Gradient Algorithm for $\ell_1 ...

Spectral Collocation Methods for Differential-Algebraic ... - Math

Spectral Methods for Fractional Differential Equations

Contextual Kernel and Spectral Methods for Learning

Spectral Methods for Hyperbolic Problems - Infoscience - EPFL

iterative substructuring methods for spectral ... - Semantic Scholar

New methods for NMR spectral analysis

Convergence of spectral methods for nonlinear ... - cscamm

New Methods for Spectral Clustering - Semantic Scholar

hp Spectral element methods for three dimensional

Spectral Methods for the Calculation of Risk

Spectral Methods for Tempered Fractional Differential Equations

Nonmonotone Filter Method for Nonlinear ... - Optimization Online