Numerische Mathematik
c Springer-Verlag 1995
Numer. Math. 69: 483{493 (1995)
Electronic Edition
Numerical solution of a secular equation
A. Melman
Department of Industrial Engineering and Management, Ben-Gurion University, IL-Beer-Sheva 84105, Israel Fax: 972-7-280776; e-mail:
[email protected] Received July 1, 1994
Summary. A method is proposed for the solution of a secular equation, arising in modi ed symmetric eigenvalue problems and in several other areas. This equation has singularities which make the application of standard root- nding methods dicult. In order to solve the equation, a class of transformations of variables is considered, which transform the equation into one for which Newton's method converges from any point in a certain given interval. In addition, the form of the transformed equation suggests a convergence accelerating modi cation of Newton's method. The same ideas are applied to the secant method and numerical results are presented. Mathematics Subject Classi cation (1991): 65F15, 65H05
1. Introduction The following equation arises when modifying symmetric eigenvalue problems (Golub (1973)) : (1)
1+
n X bj
2
d ? =0:
j =1 j
Assuming that the bj 's are all nonzero and that the dj 's are distinct, this function has n roots, separated by the n values dj . These roots are the eigenvalues of the real symmetric matrix D + bbT , where 2 R, b = [b1; b2 ; :::; bn ]T 2 Rn and D = diagfd1 ; d2 ; :::; dn g. This equation is therefore a \secular equation" for the matrix D + bbT (Golub (1973)). Equations like this play a role, e.g., in updating the singular value decomposition of matrices (Bunch, Nielsen (1978)). Related secular equations are found when solving least squares type problems (Chan, Olkin, Cooley (1992), Forsythe, Golub (1965), Gander (1981), Gander, Golub, von Matt (1989), Golub, von Matt (1991), Hanson, Phillips (1975), Reinsch (1967), Reinsch (1971), von Matt (1993)). A similar equation appears in invariant subspace computations (Fuhrmann (1988)), or when using the \escalator method" for computing the eigenvalues of a matrix (Fadeeva (1959), pp. 183-192). As an example, Figure 1 gives the plot of the function h() = 1 + ?20:?1 + ?0:05:5? + ?0:2 + 1:20:?2 + 20?:4 Numerische Mathematik Electronic Edition { page numbers may dier from the printed version page 483 of Numer. Math. 69: 483{493 (1995)
484
A. Melman
Note that the smaller b2i , the closer the appropriate root is to either di or di+1 . In order to solve this secular equation, any method with fast local convergence can be used, if it is suitably safeguarded. The method of choice is usually Newton's method. However this method is based on a local linear model of the function of which the roots are to be computed. It would seem that better results could be obtained by using a model more closely resembling the function concerned. Such an idea formed the basis for an algorithm presented in Bunch, Nielsen, Sorensen (1978). In that article, an ecient algorithm was proposed which (like Newton's method) requires at each iteration the value of the function and its derivative. This method, which we shall refer to as the \BNS method", converges from any suitably chosen point (avoiding the need for safeguards) with a quadratic order of convergence, but its convergence analysis is quite involved. Our approach diers in that rst a transformation of variables is performed, which transforms the function into one for which both Newton's method and the secant method converge from any point in a given interval, so that here too no safeguards are required. A further advantage of this approach is a simpli ed convergence analysis, which enables us to identify modi cations of Newton's method and the secant method that improve the speed of convergence. It was found that these methods are faster than the one used by Bunch et al. Since the given problem typically has to be solved many times in the course of a particular application, the savings in computing time might be quite signi cant. The paper is organized as follows. In Sect. 2, we derive sucient conditions for a transformation of variables to yield a problem suitable for solution by Newton's method without safeguards and also propose an improved Newton method. In Sect. 3, the same is done for the secant method. Numerical results are presented in Sect. 4.
2. Convexifying transformations and Newton's method We start by de ning the function f () : (2)
f () = 1 +
n X bj dj ? 2
j =1
In order to compute the i-th root of f (), we set : = di + and j = dj ? di , and we de ne n X b2 fi () = 1 + ?j ; (3) j =1 j
as in Bunch, Nielsen, Sorensen (1978). We assume that the dj 's are distinct and therefore that 1 < 2 < < i?1 < i = 0 < i+1 < < n : Our problem then is to solve the equation fi () = 0 on the interval (0; i+1 ). We assume w.l.o.g. that > 0. Should this not be the case, then di can be replaced by ?dn?i+1 and by ?. We also assume that the bj 's are all nonzero. We start with 1 i < n. The special case i = n will be treated separately. Figure 2 gives the plot of Numerische Mathematik Electronic Edition { page numbers may dier from the printed version page 484 of Numer. Math. 69: 483{493 (1995)
Numerical solution of a secular equation
485
h1 () = 1 + ?0:1 + 1:50:?5 + 20?:2 + 3:20:?2 + 40?:4
The root of this function on the interval (0; 2) determines the rst root of h(), the function plotted in Fig. 1. 10 8 6 4 2 0 -2 -4 -6 -8 -10
-4
-3
-2
-1 Fig. 1.
0
1
2
3
4
The function h()
20 15 10 5 0 -5 -10 -15 -20
0
0.2
0.4
0.6
0.8
1
1.2
Fig. 2.
The function h1 ()
1.4
1.6
1.8
2
The reason that Newton's method is not globally convergent when used for a function like fi is that this function is not \properly shaped". It is not hard to understand (see Fig. 2) why that method might not converge on the interval of interest if the initial point is not close enough to the root. We will propose Numerische Mathematik Electronic Edition { page numbers may dier from the printed version page 485 of Numer. Math. 69: 483{493 (1995)
486
A. Melman
a class of transformations of the variable yielding a convex expression for fi . This convexity ensures the convergence of Newton's method from any initial point in a certain given interval. We will also show how to obtain such an initial point. All transformations will implicitly be assumed to be twice continuously dierentiable and to be proper, i.e., they are one to one and their range (possibly including 1) is sucient to cover the values of the original variable. We start with the following lemma.
Lemma 2.1 (Sucient condition for convexity of fi
1 w( )
). The trans-
transforms fi () into a convex function Fi ( ), formation of variables = 00 if w ( ) 0 for all such that w( ) > i1+1 . 1 w( )
2
Proof. The function fi is composed of a constant plus terms of the form jb?j , and we rst consider the case j 6= i, which means that j 6= 0. After the transformation = w(1 ) , these terms become
b2
2
bj w( ) b2j bj w( ) j + = (4) = w( )j ? 1 w( ) ? 1j j w( ) ? 1j Taking the second derivative w.r.t. of this expression yields : j j
2
0 00 1 0 ( ) ? w( ) ? 2 w w ( ) C bj B@ A j 1 j
2
w( ) ? 1j
2
3
Since i1+1 > 1j for all j and since w( ) > i1+1 , it is clear that this last expression will be positive if w002 ( ) 0. For j = i (and noting that i = 0), our transformation transforms ib?i into ?b2i w( ), and the second derivative of this expression is positive for w00 ( ) 0. ut Simple examples of such transformations can be obtained, e.g., by taking for w( ) a function 1 of the form p, where 0 < p 1. Figure 3 gives the plot of H1 ( ) = h1 , where h1 () was the function plotted in Fig. 2. The following two theorems show how convexifying transformations can be used to obtain global convergence properties for Newton's method when applied to nding the root of fi . Theorem 2.1. Let the real function F (x) be convex, decreasing (resp. increasing) and twice continuously dierentiable on the closed nite interval [a; b], and let F (a)F (b) < 0. Then Newton's method converges monotonically to the unique solution x of F (x) = 0 on [a; b] from any initial point in [a; x ] (resp. [x ; b]). Proof. This theorem is a special case of Theorem 4.8 in Henrici (1964) and the proof is therefore immediate. It can also be easily understood from the geometric interpretation of Newton's method. ut
.
Notation. In what follows we denote the unique root of Fi ( ) fi
1 w( )
by
Numerische Mathematik Electronic Edition { page numbers may dier from the printed version page 486 of Numer. Math. 69: 483{493 (1995)
Numerical solution of a secular equation
487
8 6 4 2 0 -2
2
4
6
8
10
12
14
16
? The function H1 ( ) = h1 1
Fig. 3.
18
20
Theorem 2.2. Let = w be a proper transformation of variables, with w0 ( ) = 6 0 and w00 ( ) 0 for all such that w( ) > +1 > 0. Then New1 ( )
1 i
ton's method, applied monotonically to ?to1 Fi1( ) = 0i withh i < ?n1, converges from any point in w i+1 ; or ; w i1+1 , depending on whether w0 ( ) > 0 or w0 ( ) < 0, respectively. Proof. The theorem will be proved for the case w0 ( ) > 0. The for w0 ( ) < proof 0 is completely analogous. From (4), the function Fi ( ) = fi w(1 ) of which we are computing the root, is given by
Fi ( ) = 1 +
n X bj
2
j =1 j =i
6
n X
b j
2
j ? bi w( ) + j=1 w( ) ? 1j 2
j
j 6=i
Taking the derivative w.r.t. yields : (5)
0 b 1 n X Fi0 ( ) = ? B @bi + CA w0 ( ) < 0 : j j
2
j =1 j =i
6
2
w( ) ? 1j
2
This means that 1Fi ( ) is strictly decreasing. Since 2 (0; i+1 ), we have that?1 w(1 ) 2 i+1 ; +1 and therefore, because w( ) is increasing, 2 w ( i+1 ); +1 . From the expression for Fi ( ) it is clear that as ! w?1 i1+1 , Fi ( ) ! +1, and that as ! +1, Fi ( ) ! ?1. We therefore know that Fi ( ) has a unique nite root on (w?1 ( i1+1 ); +1). Since, in Numerische Mathematik Electronic Edition { page numbers may dier from the printed version page 487 of Numer. Math. 69: 483{493 (1995)
488
A. Melman
addition, Fi ( ) is convex by Lemma 2.1, applying Theorem 2.1 concludes the proof. ut To complement the previous theorem, we shall construct a point on the lefthand side of the root for the case where w0 ( ) > 0. A point on the right-hand side of can be constructed analogously when w0 ( ) < 0. Since for all :
Fi ( ) = 1+
n X bj
2
j =1 j =i
6
n X
b j j
n X bj
2
2
b +1
2
i
i+1
j ? bi w( )+ j=1 w( ) ? 1j 1+ j=1 j ? bi w( )+ w( ) ? i1+1 ; 2
j 6=i
2
j 6=i
the root of the function on the right-hand side ofthe inequality must lie to the left of the root (and to the right of w?1 i1+1 ). This root can be found by rst solving a quadratic equation, and then applying the inverse of w. For a transformation satisfying the conditions of Theorem 2.2, the form of Fi ( ) enables us to construct an improved version of Newton's method. We proceed as follows. The (k + 1)-th Newton iterate is given by the solution to (6) Fi (k ) + Fi0 (k )( ? k ) = 0 ; where k is the k-th iterate. These iterates are obtained by a linear approximation to Fi ( ). A better performance could therefore be expected if we leave the \most nonlinear" term in the approximation 1 and approximate linearly only the ? 1 other terms. Close to the singularity w i+1 , which is the most troublesome
? +1 2 bi
area for a linear approximation, the dominant term is given by w( )i+1 ? i1+1 , and it is this term which we will include in the approximation. Denoting the sum of the remaining terms by Ri ( ), i.e.,
b +1 i
i+1
Fi ( ) = Ri ( ) + w( ) ?
2 1
i+1
;
we have the following theorem. Theorem 2.3. Let w( ) satisfy the conditions of Theorem 2.2 with w0 ( ) > 0 and let fk g be the sequence of iterates generated by Newton's method i for the equation Fi ( ) = 0, with i < n and with 0 2 w?1 i1+1 ; . Then the sequence fk g with
k+1
8 > < = > j w( ) > 1 i :
+1
and Ri (k ) + Ri0 (k )( ? k ) +
b +1 i i+1
2
w( ) ? i1+1
9 > = = 0> ;
and 0 = 0 , converges to at least as fast as the sequence fk g. An analogous result is obtained for w0 ( ) < 0. Proof. We start by noting that, by similar arguments as in the proof of Theorem 2.2, the iterates k are well-de ned. First we prove that for k = k : ? k+1
? k+1 . From the convexity of Ri ( ), we have : Ri ( ) Ri (k ) + Ri0 (k )( ? k ) : Numerische Mathematik Electronic Edition { page numbers may dier from the printed version page 488 of Numer. Math. 69: 483{493 (1995)
Numerical solution of a secular equation
489
Furthermore, de ning
b +1
2
i
Gi ( ) = Ri (k ) + Ri0 (k )( ? k ) + w( ) ? i+1
and
1
i+1
;
Ai ( ) = Fi (k ) + Fi0 (k )( ? k ) ;
one has that
b +1
2
i
i+1
Gi (k ) = Ri (k ) + w( ) ? 1 = Fi (k ) = Ai (k ) k i+1
b +1 0 w (k ) G0i (k ) = Ri0 (k ) ? +1
and
i i
2
w(k ) ?
2
1 i
= Fi0 (k ) = A0i (k ):
+1
Therefore Ai ( ) is the linear approximation to both Fi ( ) and Gi ( ). Now, since the function Gi ( ) is convex, Gi ( ) Ai ( ). Putting all of this together, we obtain the following inequalities :
Fi ( ) Ri (k ) + R0 (k )( ? k ) + i
b +1 i
2
0 w( ) ? i1+1 Fi (k ) + Fi (k )( ? k ) : i+1
The functions Gi ( ) and Ai ( ) are strictly decreasing and therefore each have a unique root (see Fig. 4).
w
?1 1 i+1
k = k
k+1 k+1
> Fi ( )
Ai ( ) Gi ( ) Fig. 4.
The functions Fi ( ), Gi ( ) and Ai ( )
Numerische Mathematik Electronic Edition { page numbers may dier from the printed version page 489 of Numer. Math. 69: 483{493 (1995)
490
A. Melman
Recalling that Fi0 ( ) < 0, the previous inequalities then show that the root of Gi ( ) must lie between the regular Newton iterate k+1 and and therefore that ? k+1 ? k+1 . For a convex decreasing function such as Fi where all the Newton iterates (regular or improved) lie on one side of the root, continuing from a point closer to the root with either the regular or improved Newton method also causes the following iterate to be closer to the root. Recalling from Theorem 2.2 that the sequence fk g converges to , the sequence fk g therefore converges to at least as fast as the sequence fk g. ut For i = n, the function Fi is of the same ?form as for i