How to Really Find Roots of Polynomials by Newton's ... - CiteSeerX

1 downloads 0 Views 385KB Size Report
Dec 17, 1998 - Even if, for a given polynomial, the set of starting points not converging to ..... points in just the same way that periodic dynamic rays of polynomials land .... that the two circular boundary segments become opposite sides of the rect- ... that the angles between adjacent points on the various circles di er by the.
How to Really Find Roots of Polynomials by Newton's Method John Hubbard, Dierk Schleicher and Scott Sutherland December 17, 1998

Contents

1 2 3 4 5 6 7 8 9

Introduction The Global Geometry The Immediate Basins Geometry of the Channels Hitting the Channels Estimating the Constants If all the Roots are Real Points on a Single Circle The Number of Iterations

2 7 9 11 15 22 25 27 29

Abstract

We investigate Newton's method to nd roots of polynomials of xed degree , appropriately normalized: we construct a nite set of points such that, for every root of every such a polynomial, at least one of these points will converge to this root under Newton's map. The cardinality of such a set can be as small as 1 11 log2 ; if all the roots of the polynomial are real, it can be 1 30 . d

:

:

1

d

d

d

Newton's Method for Polynomials

2

1 Introduction Newton's method is well known and widely used to nd zeroes of di erentiable functions. Suciently close to a zero, it converges very rapidly. The problems with Newton's method are on a global scale: how to nd good starting points which detect all the roots within a reasonable number of iterations. In this paper, we will present such a nite collection of starting points such that the Newton method started at these points nds all the roots within a number of iterations depending only of the degree of the polynomial. It is not our intention to nd the fastest possible method for factoring polynomials; rather, our aim is to better understand the dynamics of the classical easy-to-implement Newton method. Let Pd be the space of polynomials of degree d, normalized so that all their roots are in the open unit disk D . For any such polynomial, we will be interested in its Newton map Np : P1 ! P1 de ned by Np(z) = z ? p(z)=p0 (z). It is well known that points z suciently close to a root of p will converge to this root of p under iteration of Np. If the root is simple, then the convergence is quadratic, otherwise linear. If two polynomials di er by a multiplicative constant, then their Newton maps are equal, so we may as well suppose our polynomials to be monic and de ned by the position of their roots: p(z) = Q i (z ? i ). It is well known that there exist polynomials for which open subsets of the complex plane do not converge to any of the roots, but instead for example to attractive periodic orbits. An example is shown in Figure 1. In fact, the set of such polynomials is open in the space of all polynomials of any given degree. Even if, for a given polynomial, the set of starting points not converging to any root has no interior, it might have positive measure. Finally, it is not obvious how to nd starting points which converge to any given root of the polynomial. The goal of this paper is to present a collection of starting points which will nd all the roots of any polynomial in Pd , and we show that the number of iteration steps required to nd the roots to within " is bounded by a constant depending only on the degree d and on ".

Theorem 1 (Main Theorem)

For every d  2, there is a set Sd consisting of at most 1:11 d log2 d points in C with the property that for every polynomial p 2 Pd and each of its roots, there is a point s 2 Sd converging to the chosen root under the Newton map.

Newton's Method for Polynomials

3

Figure 1: The Newton map for the polynomial z 7! z3 ? 2z + 2 has a superattracting cycle of period 2. Left: the Newton map over the reals, showing the 2-cycle. Right: the same Newton map over the complex numbers. Black indicates the Julia set as well as the basin of the superattracting 2-cycle. For polynomials all of whose roots are real, there is an analogous set S with at most 1:3 d points. In any case, the number of iteration steps it takes to nd the root with precision " > 0 is bounded by a function N (d; ") independent of p and of the root.

The theorem is constructive: in Section 5, we explicitly construct such a set Sd in the general case (see Figure 2): it will consist of 0:2663 log d circles, each containing 4:1627 d log d points at equal distances. (We have ignored the fact that the numbers of circles and points must be integers, so the actual number of points can be slightly larger due to rounding up.) The restriction to polynomials with all the roots in the unit disk is not severe. For any polynomial, it is easy to estimate the maximal absolute value of any of its roots in terms of the coecients: for jzj too large, the highest order term will dominate the rest of the polynomial and there can be no root. Appropriate rescaling of the coordinate system will bring all the

Newton's Method for Polynomials

4

roots into the unit disk. Moreover, there is a precise classical criterion based on continued fractions to determine whether or not all the roots of a given polynomial are in the unit disk; see [JT, Chapter 7.4]. This criterion was brought to our attention by A. Leutbecher. We believe that the cardinality of Sd is not too far from optimal: we anticipate that a universal set Sd as speci ed in the theorem must have more than O(d log d) elements, i.e., jSdj=d log d ! 1. In the real case, the number of our points is only a constant times the number of roots.

Figure 2: A set of starting points which hits the immediate basin of every root of every polynomial in Pd (sketch; typically, the number of circles is much smaller than the number of points on each of them). Concerning the number of iterations, it is easy to see that N (d; ")  O(d j log "j) over the entire space of degree d polynomials with our normalization, as can be veri ed for the trivial polynomial p(z) = zd with Np(z) = (d ? 1)z=d. We do not have a constructive upper bound for N (d; "), but it seems plausible that z 7! zd is already the worst case. It is possible to redo the results in this paper for the relaxed Newton method, i.e., for z 7! z ? hp(z)=p0(z) with a real constant h 2 (0; 1).

Newton's Method for Polynomials

Figure 3: The Newton map for a polynomial of degree 7, showing that every root can be connected to 1 within its basin of attraction.

5

Newton's Method for Polynomials

6

This paper builds on results and ideas of many other people: F. Przytycki [Pr] has investigated the immediate basins of attraction of roots under Newton's map; A. Manning [M] has shown how to nd at least a single root of a polynomial, and he has given an explicit bound how many iterations it takes to nd it with prescribed precision. Many of the arguments in this paper are borrowed from these people. Since we will need their results together with their constructions, we include them here. The organization of the paper and of the proof is as follows: in Section 2, we give a number of introductory lemmas which help to set up the algorithm. In Section 3, we turn to the immediate basins of the roots and show that they have \channels" which extend all the way out to in nity; these results go back to Przytycki and Manning. We investigate the geometry of the channels in Section 4: these channels are not too narrow; following a suggestion of Douady, we measure their widths in terms of the modulus of the quotient of the annulus identi ed by the dynamics. Using a variant of the residue theorem, we give a lower bound on the width of at least one channel for every root. Since all the roots and thus all the interesting dynamics of the Newton map are within the unit disk, we have good control suciently far away from this disk. In Section 5, we show how to nd starting points within these channels. This is accomplished using extremal length arguments and a sequence of conformal mappings involving the elliptic modular function; the detailed calculations are collected in Section 6. The case that all the roots of the polynomial are real allows a number of improvements which will be discussed in Section 7. Placing all the starting points onto a single circle is also possible, but the number of necessary points is substantially larger (and the calculations get more involved): see Section 8. Finally, in Section 9, we show that the number of iteration steps to nd all the roots with prescribed precision is bounded, depending only on the degree and the required precision. Acknowledgements. We have bene tted from suggestions by many people, among them Adrien Douady, Myong-Hi Kim, Armin Leutbecher, Anthony Manning, Feliks Przytycki and Mitsuhiro Shishikura, who will nd some of their contributions acknowledged throughout the paper. We are grateful to all of them, as well as to numerous colleagues who have shared their ideas with us over the years. Moreover, we would like to thank the Institute for Mathematical Sciences and its director John Milnor for its hospital-

Newton's Method for Polynomials

7

ity, support, and inspiring environment. It has helped bringing us together again. In addition, Dierk Schleicher would like to thank the Studienstiftung des Deutschen Volkes for its support all along, as well as for funding a workshop on holomorphic dynamics on which he was introduced to the dynamics of Newton's method. During this workshop and later, discussions with Walter Bergweiler, Fritz von Haeseler, Hartje Kriete and Ste en Rohde were quite helpful. Scott Sutherland would like to thank his thesis advisor Paul Blanchard. This thesis was a germ of this work. Discussions with Bob Devaney, Dick Hall, and Greg Buck were extremely helpful.

2 The Global Geometry A critical point of a holomorphic map is, as usual, a point where the derivative vanishes. Critical points of Np = id?p=p0 are solutions of Np0 = pp00=(p0)2 = 0, i.e., zeros and in ection points of p. If a root of p is simple, then it is a critical point of Np, i.e., a superattracting xed point. A root of multiplicity k  2 is attracting with real multiplier (k ? 1)=k. In order to study the geometry of the immediate basins, we will need three simple lemmas. The rst is known as Lucas' theorem; see the rst two chapters in Marden [Ma] and in particular Theorem 6.1.

Lemma 2 (Critical Points of Polynomial)

The convex hull of the roots of any polynomial contains all its critical points, as well as the zeros of any higher derivative of the polynomial. In particular, it contains the critical points of the Newton map. Proof. By induction, it suces to discuss the positions of the critical points.

If all the roots of the polynomial are contained in any half plane, then by translating and rotating the coordinate system, weQ may assume this to be the left half plane. PFactoring the polynomial as (z ? i), the derivative Q becomes (z ? i) (1=(z ? i)). For z on or to the right of the imaginary axis, the real parts of z ? i and thus of 1=(z ? i ) are positive, and the derivative cannot vanish. In the subsequent two lemmas, we will always consider a polynomial p 2 Pd and its Newton map Np.

Newton's Method for Polynomials

8

Lemma 3 (Asymptotic Geometry of the Newton Map) For jzj  1, we have Np(z) ? d?d 1 z  1=d and jNp(z)j < jzj.



Proof. First assume jzj > 1. Factoring the polynomial p again as p(z) = Q i (z

? i), we can write

Np(z) = z ? pp0 (z) = z ?

P

1

1

i z ? i

:

All the z ? i are contained within the closed disk of radius 1 around z. This disk does not contain the origin. Denoting it by D, then the domain fw : w?1 2 Dg is another disk in C andPcontains all the (z ? i )?1. Summing and taking the inverse, we see that 1= i z?1 is contained in the closed disk of radius 1=d around z=d, and this proves the claim for jzj > 1. Finally, for all jzj  1 the rst result follows by continuity. For z 2 C with jzj  1, the inequality jNp(z)j  jzj is possible only if jzj = 1 and Np(z) = z by the estimate we just proved. However, this implies that z is a root of p, while we had assumed that all roots have absolute values less than 1. For the following lemma, let U := fz 2 P1 : jzj > 1g. i

Lemma 4 (Domain of Linearization of In nity)

The domain of linearization of the Newton map around the repelling xed point 1 includes the region U : there is a univalent map ' : U ! C such that '  Np = d ?d 1 ' on fz 2 U : Np (U ) 2 U g  fz 2 U : jzj > (d + 1)=(d ? 1)g. Proof. It follows from Lemma 3 that Np (@U )

2 D , so one connected

component of N ?1 (U ) is contained in U ; denote it V .

By Lemma 2, there is no critical point of Np in U and thus in V , and Np: V ! U is a conformal isomorphism. The inverse map Np?1 : U ! V  U is a contraction for the Poincare metric, so every point in U will converge to 1 under this branch of Np?1 . Near in nity, Np?1 has the asymptotic form z 7! (d=(d ? 1))z, so the map !n d ? 1 1 (?n) (z ) ' : U ! P ; '(z) := nlim N p !1 d p

Newton's Method for Polynomials

9

is univalent on U and satis es the relation '(Np?1 (z)) = (d=(d ? 1))'(z) on U (the convergence of the limit is well known; see for instance [Mi, x 6]). Therefore, we have ((d ? 1)=d)' = '  Np on V .

3 The Immediate Basins The basin of a root of a polynomial p is the set of starting points z which eventually converge to this root. Since any root is a superattracting or at least attracting xed point of Np, the basin includes a neighborhood of the root and is in particular non-empty. The immediate basin of the root is the connected component of the basin containing the root. The following proposition is due to Przytycki [Pr] and, in more general form, to Shishikura [Sh]. We do not give a proof here.

Proposition 5 (Immediate Basins are Simply Connected)

The immediate basin of any root is simply connected and thus conformally isomorphic to the open unit disk. The basic observation which allows to nd roots is that every immediate basin has the point 1 on its boundary, and there are curves within the immediate basin connecting the root to 1. More precisely, we have non-selfintersecting curves : [0; 1] ! P1 with (0) equal to the root, (1) = 1 and

(t) in the immediate basin for t 6= 1. Any homotopy class of such curves is called an access to in nity. Outside of an appropriate compact subset of C , an access corresponds to a connected open subset of the immediate basin through which the curve is running. We will call such a set a channel to in nity (we avoid the word \canal" in order to not cause discomfort to people with bad memories of root canals). Suciently close to in nity, a channel is periodic with respect to linearizing coordinates.

Proposition 6 (Channels in Immediate Basins)

Every immediate basin of a root has a nite positive number of channels to in nity. The number of channels is equal to the number of critical points of the Newton map within the immediate basin, counted with multiplicity.

This proposition is another instance of a very general phenomenon in complex dynamics: critical points determine the dynamics.

Newton's Method for Polynomials

10

Proof. Let be a root of a polynomial p and denote its immediate basin

by U . Let m be the total number of critical points of Np in U , counting multiplicities. There is at least one: if the root is superattracting, it itself is a critical point; if it is attracting, then a standard theorem in the theory of iterated rational maps asserts that there is a critical point of Np in the immediate basin of attraction. The restriction of Np to U is a proper self-map of U with degree m + 1. Denote the open unit disk by D and let ' : U ! D be the Riemann map of U , uniquely normalized by the two conditions '( ) = 0 and '0( ) > 0. The map f := '  Np  '?1 is then a proper holomorphic self-map of D , also of degree m + 1. By the Schwarz re ection principle, it extends to a holomorphic self-map of P1 of degree m + 1, i.e. a rational map which we also denote by f . The circle S1 = @ D , as well as D and P1 ? D , are completely invariant. It follows that f cannot have critical points on S1. As a rational map of degree m + 1, the map f has m + 2 xed points, counting multiplicities. Among them, there are the superattracting xed points 0 and 1 which attract all of D and P1 ? D respectively, and m extra points which must necessarily be on S1 . These points cannot be attracting or parabolic, for they would have to attract points in D otherwise. Since D is invariant, the multipliers of the xed points on S1 must be positive real and thus strictly greater than one. It follows that all the m xed points on S1 are distinct (and f restricts to a self-covering of S1 of degree m + 1). If we assume that the conformal isomorphism '?1 : D ! U extends continuously to the boundary, then the m xed points of f on @ D will transfer to m xed points of Np on @U . A xed point of Np = id ? p=p0 in C must be a root of p, which cannot be on the boundary of U . The only extra xed point of Np is 1, so the domain U will extend out to 1 in m di erent directions. Of course, we cannot always assume that '?1 extends continuously to the boundary. In general, it is easy to construct invariant curves through each xed point of f on S1 intersecting S1 transversally (for example straight lines in linearizing coordinates). The restrictions of such lines to D will yield curves within immediate basins which converge to well-de ned boundary points in just the same way that periodic dynamic rays of polynomials land at well-de ned points in the Julia set [Mi, x 18]. These boundary points must be xed points, and the only candidate is 1. All curves converging to the same boundary xed point of D are obviously homotopic within D . It is not hard to show that di erent xed points of

Newton's Method for Polynomials

11

f on S1 will lead to non-homotopic curves and thus to di erent channels: otherwise, the entire boundary segment of S1 between two xed points would have to be mapped onto a single point on the boundary of U under the Riemann map. (Stronger yet, two di erent channels will always enclose a root of the polynomial other than the one the immediate basin belongs to.) Conversely, any non self-intersecting curve in U starting at the root and converging to 1 will be homotopic to one of the m curves constructed above. The reason is the same as for the following result for polynomials [Mi, Lemma 18.3]: if a repelling periodic point is the landing point of a dynamic ray of period 1, then all dynamic rays landing at this point are periodic of period 1. It follows that the number of channels is exactly equal to the number of boundary xed points of D and thus to the number of critical points within the immediate basin. We observe that every circle outside D centered at the origin will intersect every immediate basin. This will allow us to nd starting points for all the roots.

4 Geometry of the Channels In this section, we will investigate the geometry of the channels more closely. Since the dynamics outside the closed unit disk is approximately linear, the shape of the channels will repeat periodically near in nity. We will measure the \width" of any such channel in terms of the conformal modulus of the annulus formed by the channel modulo the dynamics. The immediate basins are simply connected. Restricting any channel to the exterior of the unit disk and taking the quotient by the dynamics, we obtain a Riemann surface homeomorphic to an annulus, hence conformally equivalent to a standard cylinder of some height h and circumference c. This is a rectangle of sides c and h with the latter pair of sides identi ed. Associated to such a cylinder is its modulus h=c, which is a conformal invariant (it is not hard to see that the modulus must be a nite positive number in our case; see below). We will call this number the modulus of the channel. It will serve to measure the width of the channel. We will now provide a lower bound on these moduli.

Newton's Method for Polynomials

12

Proposition 7 (Widths of the Channels)

If the number of critical points of the Newton map within some immediate basin is m, counting multiplicities, then this basin has a channel to in nity with modulus at least = log(m + 1). In particular, every basin has a channel with modulus at least = log d, where d is the degree of the polynomial. Proof. We will use the construction and the notation of the proof of

Proposition 6. We have an immediate basin U and a conformal isomorphism ' : U ! D sending the root to the origin. Choose a channel of U ; it corresponds to a xed point 2 S1 of the map f = '  Np  '?1 extended to P1. The point is repelling with positive real multiplier . (Although the point corresponds to the xed point 1 of Np, the multipliers may well be di erent, since the conjugation does not exist in a neighborhood of the xed points; in particular, all the xed points of f on S1 will in general have di erent multipliers.) Linearizing the repelling xed point , we obtain a linear map w 7! w, and S1 intersected with the domain of linearization turns into a straight line (see Figure 4). The quotient of the channel corresponding to will then be conformally equivalent to a half plane divided by multiplication by . As a fundamental domain, we may take the region in the half plane bounded by the two semicircles of radii 1 and . Taking logarithms, the annulus is represented by a rectangle between imaginary parts ?=2 and =2 and between real parts 0 and log , with the two vertical boundaries identi ed. This is a conformal cylinder with height  and circumference log , so the conformal modulus is = log . We will now use the holomorphic xed point formula (see for example Milnor [Mi, Section 10]): for any rational map f without xed points of multiplier +1, we have X 1 = +1 ; 1 ? f 0 (z) z

where the sum extends over all xed points of f on P1 . This formula is a corollary to the residue theorem for the map 1=(f (z) ? z) (best evaluated in coordinates for which 1 is not a xed point). For the map f we are considering, we have superattracting xed points at 0 and, by symmetry, at 1. Both contribute +1 to the sum. The only further xed points of f are the m distinct repelling xed points on S1 . Summing over those and changing

Newton's Method for Polynomials

Figure 4: Estimating the width of the channels, de ned as the moduli of the quotient annuli by the dynamics. The gure shows an immediate basin U of some root, with a fundamental domain of one channel shaded; the disk D as a conformal model of the immediate basin, with one boundary xed point for every channel of U ; a half neighborhood of such a xed point in local linearizing coordinates ; and nally a rectangle representing a fundamental domain of the channel.

13

Newton's Method for Polynomials signs, we nd

X

z 2S

14 1

0 1 f (z ) ? 1

= +1 :

(1)

Since all the denominators are real and positive, there is at least one with 1=(f 0( ) ? 1)  1=m and hence f 0( )  m + 1, so the modulus of the corresponding channel is at least = log(m + 1). Recall that m is the total number of critical points of f within the immediate basin. If all the roots of p are distinct, then the map Np has degree d and therefore a total of 2d ? 2 critical points, of which one sits at each of the roots of p. The extreme case is that one immediate basin contains one critical point at its root and all the d ? 2 remaining \free" critical points. Multiple roots reduce the degree of Np according to their multiplicities, and their immediate basins contain at least one critical point. The number of \free" critical points is thus at most d ? 2 even in the presence of multiple roots. Therefore, m  d ? 1 and the modulus is at least = log(m + 1)  = log d. Remark on the size of the immediate basins. The calculation in

Equation (1) allows to estimate the area of all the immediate basins: since all xed points on S1 are repelling, it follows from (1) that all their multipliers satisfy f 0(z)  2 and thus f 0(z) ? 1 > log f 0(z)= log(2), so the sum of the moduli of all the channels of any root is at least = log(2), and for all the roots combined the sum of the moduli is at least d= log(2). In linearizing coordinates near 1, the fundamental domain of the Newton map is bounded by two circles such that quotient of their radii is d=(d ? 1)  1 + 1=d, and for large degrees there is room for modulus 2d. By a standard lengh-areainequality, the fraction of the area of all the channels within any linearized fundamental domain is at least the fraction of the moduli they use up, which is 1=2 log(2)  0:721 (at least in the limit d ! 1 when the distance between the two circles bounding any fundamental domain is small compared to the radii of the circles). Therefore, the fraction of the area of all the immediate basins within any centered disk of radius R is asymptotically at least 0:72 for R and d large. A quantitative version of such a result has been given in [Su] (with a fraction of 0:09) and in Kriete [Kr] (with a fraction of 0:5).

Newton's Method for Polynomials

15

5 Hitting the Channels At least in principle, it is now clear how to proceed in order to nd the roots: outside of a compact neighborhood of the origin, the dynamics of Np is approximately linear, and we know that every immediate basin must have a channel with a certain minimal width. We therefore have to distribute suciently many points within some fundamental domain of the dynamics so that no channel of this width can possibly avoid all these points. This was rst done in [Su]: 11d(d ? 1) points were distributed uniformly on a single circle centered at the origin, and it was shown that if a channel is narrow enough to squeeze between these points, then its modulus must be less than = log d. In Section 8, we will show that (2=4)d3=2 points suce for this purpose. However, it is geometrically conceivable that a channel with large modulus becomes relatively thin at just one place within every fundamental domain (compare the sketch in Figure 5). The worst case is that this happens just on the circle containing the starting points. We will thus place the points onto several circles within a single fundamental domain. This has two advantages: the number of points to be placed onto each of these circles is reduced to such an extent that the total number of necessary points can be as small as 1:11 d(log d)2. Moreover, the geometry of the problem rescales and becomes independent of the degree. This would allow us to derive the result (up to a constant which is independent of the degree) without explicit calculations. Before giving a precise proof of the Main Theorem, we will outline the argument using various simpli cations. The rst of them is the assumption that the Newton map outside of D is exactly the linear map z 7! z(d ? 1)=d, so that a fundamental domain for the dynamics is any annulus between two radii R > 1 and Rd=(d ? 1). We are looking for a channel traversing such a fundamental domain which has modulus at least = log d. Now we cut this annulus into some number s of conformally equivalent circular sub-annuli, each of them bounded by a pair of circles such that the ratio of outer and inner radius is (d=(d ? 1))1=s. Making the further assumption that the inner and outer boundaries of every sub-annulus meet the channel in a connected arc, then the intersection of the channel with a sub-annulus becomes a conformal quadrilateral in such a way that its boundary parts on the circles form two opposite sides of the quadrilateral. Each quadrilateral has an associated modulus (mapping the quadrilateral biholomorphically onto a rectangle such

Newton's Method for Polynomials

16

that the two circular boundary segments become opposite sides of the rectangle, this modulus is then the ratio of the lengths of the rectangle's sides so that the length corresponding to the circular arcs is in the numerator; the modulus of a quadrilateral is a conformal invariant). By the Grotzsch inequality, the inverses of the moduli of the rectangles add up to at most the inverse of the modulus of the channel. Therefore, at least one sub-annulus has inverse modulus at most log d=s and intersects the channel in a quadrilateral with modulus at least s= log d. Writing s = log d, the modulus of the quadrilateral becomes  independently of d. We will distribute some xed number of points onto a single circle within each of the sub-annuli such that the angles between adjacent points on the various circles di er by the same constant.

Figure 5: Left: A fundamental domain of the Newton dynamics under the simplifying assumption that it is bounded by two circles, subdivided into s = 3 concentric sub-annuli. Shaded is the intersection of the fundamental domain with a channel of some root. Right: The same situation in logarithmic coordinates; the three sub-annuli turn into vertical strips (moved apart to show them separately). Each strip contains a vertical sequence of points. We now take logarithms; see Figure 5. The sub-annuli become in nite vertical strips of width log(d=(d ? 1))=s > 1=ds (we take all the branches of the log, so that the exponential maps from the strips to the sub-annuli are universal covers). Stretching by a factor s= log(d=(d ? 1)) < ds, the strips get width 1; the channel becomes a series of s quadrilaterals connecting the two boundaries of each of the strips, and one of these quadrilaterals has

Newton's Method for Polynomials

17

modulus at least  (strictly speaking, there are such quadrilaterals for every branch of the logarithm). We will now use the following universal result from conformal geometry:

Lemma 8 (Universal Geometry of Points in Strip) Let S be the vertical strip fz 2 C : ?1=2 < Re(z) < 1=2g and let Q be a

quadrilateral in S connecting the vertical boundaries of S . For every > 0, there is a number  > 0 with the following property: if the modulus of Q is , then at least one of the points i Z is contained in Q.

We will justify this in Lemma 12. If  is small enough, one of the points i Z will hit the logarithm of the channel. Transporting the points i Z back into the sub-annuli we started with, we obtain a collection of points within every sub-annulus such that the angles of adjacent points di er by at least =ds, and at least one point on at least one circle will hit a channel of every root. The number of points on such a circle is then at most 2ds= , and since there are s sub-annuli with one circle each, the total number of points used is at most 2ds2= = (2 2= )d(log d)2 . The choice of > 0 is arbitrary, and  depends only on but not on d. These numbers can be optimized so as to minimize the total number of points. This gives a proof of the Main Theorem, up to a universal constant  ( ) yet to be speci ed, and up to the various simplifying assumptions made in the argument: the Newton map is not exactly linear, and the intersections of the channel with boundaries of the sub-annuli need not be as simple as assumed. We will now provide rigorous arguments for the claim; precise estimates will be given in Section 6. First we recall the de nition of the modulus of an annulus via extremal length (see e.g. Ahlfors [A, Section I.D]): if U  C is an annulus, then `2 ( ) ; (mod (U ))?1 = sup inf  k2 kU where the supremum is taken over all measurable functions : U ! R +0 with strictly positive norm k2 kU := U 2 (x; y) dx dy. The in mum is taken over all recti able curves  U which wind once around U (i.e., which generate the fundamental group of U ). The length of such a curve is de ned with respect to  as `( ) := (z)jdzj whenever this integral exists; if it does not exist, we set `( ) := +1. It follows easily that the modulus of an annulus R

R

Newton's Method for Polynomials

18

is a conformal invariant: if ': U ! U 0 is a conformal isomorphism, then an optimal function  for U turns into an optimal 0 for U 0 by the relation 0 ('(z)) = (z)=j'0 (z)j. Closely related to moduli of annuli are moduli of quadrilaterals: let Q be a bounded subset of C with two distinguished connected subsets of the boundary so that the remaining boundary also consists of two connected subsets (for example, a rectangle with two opposite boundary sides). Admissible curves in this case are curves connecting the two distinguished boundaries, and the de nition of the conformal modulus is exactly as above. We again obtain a conformal invariant, and the modulus of a rectangle with sides h and c is h=c if the pair of sides of length h is distinguished. In our case, the restriction of any channel to a fundamental domain is not really an annulus in C ; instead, we identify the boundaries of this fundamental domain by the dynamics. The channel then becomes an annulus within the quotient torus, and the de nition above applies analogously. Any allowable function  gives a lower bound for `( ) and thus an upper bound for the modulus. We will be interested in such an upper bound, so it will suce to place any such  onto our annulus (provided that it makes `( ) strictly positive, uniformly for all ). Using extremal length, we can rephrase Lemma 8 like this:

Lemma 9 (Universal Geometry of Points in Strip II) Let S := fz 2 C : ?1=2  Re(z)  1=2g be a vertical strip and x a

number  > 0. Then there is a number m( ) > 0 and a continuous function 2 R+ 0 with k kS = 1 such that any curve in S connecting the two boundaries through the interval (0; i ) has squared length `2 ( )  1=m. As  tends to zero, 1=m( ) tends to 1, and the function m( ) can be chosen so as to be monotone.

: S !

We will provide a proof in Lemma 12, together with an estimate for m( ). For xed  , any reasonable choice of  gives a lower estimate for `( ) and thus for 1=m. In order to obtain the limiting behavior for narrow points, some care is necessary in the choice of . For the construction of the point grid, we need to have some space within a fundamental domain.

Newton's Method for Polynomials

19

Lemma 10 (Fundamental Domain) For every radius R > (d + 1)=(d ? 1), there is a number  2 (0; 1) such that the round annulus

V := fz 2 C : R((d ? 1)=d)  jzj  Rg  V 0 is contained within a single fundamental domain of the dynamics.

The quantity  measures the size of a round annulus that ts into a fundamental domain, relative to the size expected by the linearized map z 7! (d ? 1)z=d. The larger , the better will be our estimates. Proof. By Lemma 4, the image of the circle at radius R is a simple closed curve around the unit disk, and the region between the circle and its image curve is an annulus V 0 . By Lemma 3, every point on the image curve has distance to the origin of at most (d ? 1)R=d + 1=d = R ? (R ? 1)=d. If  2 (0; 1) is such that R((d ? 1)=d)  R ? (R ? 1)=d, then the round annulus V is indeed contained within a single fundamental domain. This inequality is obviously satis ed at least for  suciently close to 0. Remark. The allowed values pof  tend to 1 as R

! 1; and the choice

 = 1=2 is allowed for R > 1 + 2, independently of d.

The construction of the point grid. We start with a round annulus V with outer radius R and inner radius R((d ? 1)=d)) as speci ed in Lemma 10. The grid of starting points will be determined by two numbers and which will be determined later: the number of circles onto which we place our points is s = log d, and the number of points per circle is d log d (the quantity is related to the constant  in the heuristic argument above). For  = 1; 2; : : : ; s, consider the sub-annuli of V 8
m( ), and the claim holds for ( ; ) = m(2 = )= . Remark. It is geometrically clear that increasing (the number of circles)

for xed can only decrease ( ; ). This fact can be extracted from our calculations only by a more precise estimate on ( ), but we will not need it. The limit of ( ; ) for ! 1 with xed is 2= (for large d), so it does not even tend to 0: the limiting point grid will be densely packed onto d log d radial lines. We can now prove the theorem up to constants. Proof of the theorem. By Proposition 7, any root has at least one channel with annulus at least = log d. Choose any number > 0. For any > 0, Proposition 11 speci es a number ( ; ) such that the point grid

Newton's Method for Polynomials

22

consisting of log d circles having d log d points each hits every channel with modulus at least ( ; )= log d. There is some choice of depending only on such that ( ; ) <  so that the grid of points will hit at least one channel of every root. The total number of points used in such a grid is d log2 d. It remains to calculate the number in terms of and to minimize their product. We will do this in the next section.

6 Estimating the Constants In order to provide the constants promised in the theorem, we will need the elliptic integral Z z 0 ; z 7! (z) = q 0 0 dz 1 z (z ? 1)(P ? z0 ) depending on a real constant P > 1. This integral is de ned on the entire complex plane except if z is real and either greater than P or negative; where de ned, it should be evaluated along a straight line from 1 to z. Of special importance will be the two numbers Z 1 Z 1 0 0 dz dz q q A(P ) = = P z0 (z0 ? 1)(z0 ? P ) 0 z0 (1 ? z0 )(P ? z0 ) and Z 0 Z P 0 dz0 dz q = : B (P ) = q 0 0 1 z (z ? 1)(P ? z0 ) ?1 (?z0 )(1 ? z0 )(P ? z0 ) The map provides a conformal isomorphism of the upper half plane onto the rectangle with vertices 0, B (P ), B (P ) + iA(P ), iA(P ); this isomorphism extends as a homeomorphism to the boundaries so that the four points 1, P , 1, 0 on the extended real line map to the vertices in this order (compare for example Nevanlinna and Paatero [NP]). The reason is that the integrand on the real line is either real or imaginary, so that the image of the real line surrounds a rectangle exactly once. Similarly, the negative half plane is mapped to the re ected rectangle. The following lemma will be the motor for all the estimates. It is a quantitative proof of Lemma 9 and proves Lemma 8 as well.

Newton's Method for Polynomials

23

Lemma 12 (Geometry of Points in Strip) Let S := fz 2 C : ?1=2  Re(z)  1=2g be a vertical strip and x a number  > 0. Then there is a continuous function : S ! R +0 with k2kS = 1 such that any curve in S connecting the two boundaries through the interval (0; i ) has squared length `2 ( )  2A(P )=B (P ) for P = e2 . The function  7! P 7! 2A(P )=B (P ) is strictly monotonically decreasing and tends to 0 as  tends to 0. Proof. The map E (z ) := exp(?2iz ) maps S biholomorphically onto

C

? R ?0 , and it sends the interval I := (0; i ) onto (1; P ) for P = e2 ,

while it sends both boundaries of S onto the negative real line. The left and right halves of S are mapped onto the upper and lower half planes, respectively. The map sends these half planes biholomorphically onto the rectangles with real parts between 0 and B (P ) and imaginary parts between 0 and A(P ). The joint area of these two rectangles is 2A(P )Bq(P ). On these two rectangles, we will use the constant function 0 (z) := 1= 2A(P )B (P ). It is normalized so that k20 k = 1. Any curve within these two rectangles connecting the upper and lower boundaries (i.e., imaginary parts A(P ) and ?A(P )) will q q have length at least 2A(P )= 2A(P )B (P ) = 2A(P )=B (P ). (By Ahlfors [A, Chapter 1], this choice for 0 is optimal). Now we transport 0 to a function : S ! R +0 using the conformal isomorphism (  E )?1. The total norm of 2 will again be 1, and any curve connecting the two boundaries of S through the interval (0; i ) has squared length at least 2A(P )=B (P ): under E , the image of the curve connects the negative real axis to itself through the interval (1; P ), running around [0; 1]. Under , we obtain a curve traversing the pair of rectangles from top to bottom, indeed with squared length 2A(P )=B (P ) or more. As  decreases, P decreases as well; there is less room for the curves , so their lengths can only increase. This shows monotonicity of the function  7! P 7! 2A(P )=B (P ) (see also Ahlfors [A, Chapter 3]). As  tends to 0, P tends to 1 and A(P ) tends to 1, while B (P ) tends to a well-de ned nite value (this can be seen most easily in the second variants of the two integrals for A(P ) and B (P )). The limiting behavior for the squared lengths follows. Our next task is to minimize the total number of points d log2 d, i.e.

Newton's Method for Polynomials

24

to minimize . First we have to choose some number  depending on the radius R of the fundamental domain (the larger R, the larger we can choose  and the fewer the total number of points needed, but the longer it takes for the starting points to iterate towards the unit disk). It will be convenient to start with  , the vertical distance between points in the strips S . Set P := e2 . In order to hit all the channels with moduli at least = log d, the number ( ; ) from Proposition 7 must be less than . Since ( ; ) = m( )= with m( ) as in Lemma 8, we need to choose  m( )=, and we will x = m( )=. By Lemma 12, m( ) = B (P )=2A(P ) and thus = B (P )=2A(P ). By Equation (2), we have  < 2 = , so the necessary is less than 2 = . Increasing can only help, so we can use = 2 = . The nal product reads then 2 2 2  1 B ( P ) <  = 2 A(P ) ; it only depends on  . For  = 0:40198, we have P  12:5,  0:2663 and  4:1627, so  1:1086. This gives a valid choice of points for the theorem; the reader is invited to verify that these numbers are indeed close to the minimum. Of course, the numbers of circles and the numbers of points on each of them must be integers, so we have to round up. For small integers, it can be of advantage to use di erent values of and to yield a smaller total number of points. For example, the number of circles 0:2663 log d is less than 1 for d  42. The choice = 1=2 yields (log d)=2 circles (which is approximately equal to one for degree d = 7), and we obtain   2:26 and  2:78,  1:39. This might be a useful choice for small degrees: when there is one circle anyway, the number of points on it can be reduced. At nite radii, the Newton map is not exactly linear, and the quantity  is smaller than 1. The optimal value for  is independent of R and , and the number of circles log d does not change, either. Larger radii R yield larger values of  (closer to 1) and thus smaller values of , i.e. fewer points per circle. However, it takes p longer to nd the roots. The choice  = 1=2 is allowable for radii R > 1+ 2  2:414, and  = 0:9 is possible for R  13:94. Remark. We did not require any correlation between the points on the di erent circles. This worst case is that all the points are lined up (so that all the points on the various circles sit at equal angles). The bounds can !

Newton's Method for Polynomials

25

be somewhat improved if we rotate the points on the circles \out of phase", so that the points on adjacent circles are rotated by exactly half the angle between adjacent points. However, we have not been able to give a better upper bound for the moduli of annuli avoiding all these points. Remark. The elliptic integrals A(P ) and B (P ) used in the calculation can of course be evaluated by any numerical method. Strangely enough, various implementations of Maple were unable to calculate certain of these integrals in nite time. Fortunately, there is a much more ecient way to calculate these elliptic integrals using an iterative procedure going back to Gau. It uses the arithmetic-geometric mean. For two positive real numbers a0 and b0 , we de ne two sequences as follows: set recursively an+1 := (an + bn)=2 (the arithmetic mean) and bn+1 := p anbn (the geometric mean). For any choice of initial values, these two iterations converge extremely rapidly to a common limit, which is denoted AGM(a0 ; b0). In fact, this convergence is quadratic: as fast as the Newton map near a simple root. Now we can calculate the elliptic integrals as follows:  p : p  p and B (P ) = A(P ) = AGM( P ? 1; P ) AGM(1; P ) This relation can be found in many textbooks on elliptic functions; see for example Hurwitz and Courant [HC]. Another more explicit reference is Bost and Mestre [BM, Section 2.1] and in particular their formulas (10) and (11) (their polynomials are normalized so as to have a factor 4 in front, so their formulas di er from ours by a factor of 2).

7 If all the Roots are Real The case that all the roots of a given polynomial are real might be of special interest. It allows a particularly simple treatment. Let p be a polynomial in Pd all of whose roots are real, and denote the smallest and largest roots by min and max. Then all of fx 2 R : x  ming will converge to min, and all of fx 2 R : x  maxg will converge to max, so both ends of R will be parts of channels to in nity. By symmetry, all the other d ? 2 roots must have at least two channels each, so their immediate basins must contain at least one free critical point. Since this exhausts the available free critical points, it follows that the minimal and maximal roots

Newton's Method for Polynomials

26

each have exactly one channel, and the other roots each have exactly two channels. (This can also be shown by the fact that two channels of any root always enclose a further root.) By Proposition 7, every channel has width at least = log 3, independently of the degree. We will use a single circle, so s = log d = 1. First we nd P such that the moduli are B (P ) =  : 2A(P ) log 3 We obtain P  3 972 132 and log P  15:194 8. In the limit R ! 1, we then need 42d= log P  2:5982 d points on a full circle or, using the symmetry, 1:2991 d points on a semicircle. That is an eciency of 77% (the ratio of the roots to the number of starting points). We could try to reduce the necessary number of points by using several circles, as in the previous section. However, it turns out that it is best to use a single circle; using more circles only increases the number of points. There might be an ecient bound on the number of iterations in the case of real roots: Proposition 2.3 in Manning [M] implies that, when any point z has imaginary part y > 0, then its image under the Newton map will have imaginary part less than (1 ? 1=d)y (the imaginary part of the image may be negative; in that case, the bound says nothing at all). This might allow to bound the number of necessary iteration steps needed to get "-close to the root for points converging to it within the upper half plane. Since all the critical points converge to the roots, the Newton maps of polynomials with only real roots are hyperbolic, there are no extra attracting orbits, and the measure of the Julia set is zero. Thus almost every starting point will converge to some root. Restricting to the real line, an old result by Barna [Ba] says that the set of starting points not converging to any root is a Cantor set of measure zero. For arbitrary polynomials, wide basins are not so untypical, as the following corollary shows.

Corollary 13 (Number Versus Widths of Channels)

More than half of the roots of any polynomial of arbitrary degree have a channel of modulus at least = log 3. Proof. All the channels of any root can be smaller only if the immediate

basin contains at least two free critical points, but since there are less free

Newton's Method for Polynomials

27

critical points than there are roots, only half of the roots can behave in this way. If we distribute points as above in the real case, except using an entire circle rather than its upper half, we can be sure to nd at least half the roots. We can then de ate the polynomial and continue with the remainder. We need 2:6 d starting points in the rst step, half of that after the rst de ation, a quarter after the second de ation, and so on. The total number of starting points used is then less than 5:2 d. Of course, de ating introduces numerical errors, but this will occur only log d times, rather than d times when de ating for every individual root. In a di erent setting, Kim and Sutherland [KS] have controlled the errors introduced by similar log d de ation steps.

8 Points on a Single Circle In Section 5, we have used log d circles to place our starting points onto. This had the advantage that the dependence on the degree no longer showed up in our calculations, and it also improved the necessary number of starting points in a substantial way. In this section, we explore how the number of starting points behaves if we place all the points onto a single circle. This had earlier been done by Sutherland [Su]: after various improvements, 11d(d ? 1) points were needed. Here, we will show that (2=4)d3=2  2:47 d3=2 points are sucient, at least in the limit d ! 1. First, we will investigate the asymptotic behavior of A(P ) and B (P ) when P & 1. This is particularly easy for B (P ) because the integral still exists at P = 1. Writing P = 1 + ", we have Z 1 Z 1 p 1 dx q B (P ) = < (1 +dxx)px = 2 arctan x 0 = : 0 x(x + 1)(x + 1 + ") 0 The evaluation of A(P ) needs more care: Z 1 Z 1 dx dx q q = A(P ) = 0 1+" x(x ? 1)(x ? 1 ? ") (x + 1 + ")(x + ")x Z 1 dx q : > p1 1 + " 0 x(x + 1)(x + ")

Newton's Method for Polynomials

28

p

p

Now we split up the integral to the intervals (0; ") and ( "; 1). We get Z

p"

p

" dx dx > 1p 0 x(x + 1)(x + ") x(x + ") 1+ " 0 p 1= " dt = 1 2 log pt + pt + 1 1 = p p t(t + 1) 1+ " 0 1+ " 2 log 2"?1=4 ? 21 log " + 2 log 2 : = > p p 1+ " 1+ " Z

q

q

q

Z



q

q

q

1=p" 0







q

q

p For the second interval ( "; 1), we have the same value as for the interval p (0; ") because the Mobius transform z 7! "=z xes the four singularities f?1; ?"; 0; 1g of the integrand and turns one interval into the other. (Quite explicitly, the substitution x 7! "=x turns one integral into the other.) Therefore, we obtain

? log " + 4 log 2 > ? log " +p4 log 2 : A(1 + ") > p p 1+ " 1+" 1+ " q

Now we can estimate the modulus again as above: " describes the spacing of the points on a single circle, and any channel avoiding all these points has its modulus bounded above by B (1+ ")=2A(1+ "). Since we know that there is a channel with modulus at least = log d, we have to nd an " such that =log d > B (P )=2A(P ). Using our estimate above, we have to nd an " such that p")   : B (1 + ") <  (1 + < = 2A(1 + ") 2 ? log " +p4 log 2 2(? log " + 4 log 2) log d 1+ " p

p

pp

p

This yields "1=(1+ ") < d?1=2  161=(1+ "). But since " " and 16 " both tend to 1 as " & 0, we obtain the necessary inequality " < 16 d?1=2, up to a factor tending to 1 as d ! 1 (or " ! 0). It follows that we need a value P = 1 + "  16 d?1=2. In the approximation that N (z) = ((d ? 1)=d)z and d is large, adjacent points should have angles which di er by "=2d, so the number of points we need is 42d=" = (2 =4)d3=2  2:47 d3=2.

Newton's Method for Polynomials

29

Note that restricting the points onto a single circle not only increases the number of points substantially, it also requires the estimates to depend on the degree d. (Of course, it is possible to estimate the moduli by di erent methods, for example using the ellipse construction as in [Su].)

9 The Number of Iterations We have seen how to nd good starting points for all the roots of a polynomial in Pd . In this section, we will show that it does not take too long to nd the roots with prescribed precision when starting at the speci ed points.

Proposition 14 (The Number of Iterations) For every degree d  2, every precision " > 0, every radius r > 2 and every

set Sd of starting points as in the Main Theorem, there is a number Nd ("; r) with the following property: for every root of every polynomial in Pd , there is point in Sd which converges to this root and which takes at most Nd ("; r) iterations to get "-close to the root. Proof. We will use a compactness argument, arguing by contradiction.

Suppose that there exists a sequence of polynomials pk of xed degree d and a root k of pk such that each of these starting points either does not converge to k at all, or it takes more than k iterations to get "-close to k . By restricting to a subsequence, we may assume that all the roots of the pk converge to (not necessarily distinct) limit points in D , and in particular we may assume that k converges to a limit . We may assume that all the pk are monic. Then they converge to a limit polynomial p of the same degree with all its d roots in D , and is one of its roots. The immediate basin of attraction of has one of the nitely many starting points in Sd within its basin of attraction, say z ; let M be the number of iteration steps it takes for this point to get "=2-close to . Since attracting basins are semicontinuous from below, there is a neighborhood of p in Pd such that every polynomial in this neighborhood will have a root close to containing z in its immediate basin of attraction, and so that it takes less than M iterations for this point to get "-close to this root. This neighborhood of p will contain all pk for suciently large k, which is a contradiction.

Newton's Method for Polynomials

30

Remark. We do not have an explicit bound on the numbers Nd ("; r; s).

A trivial case with slow convergence is given by the polynomial z 7! zd , where all the roots are coalesced at the origin. The Newton map is linear: z 7! (d ? 1)z=d, and any point at distance r from the origin takes d j log "j= log r steps to get "-close to the root. The Newton method without modi cations will therefore have Nd ("; r)  d j log "j= log r. Although this example is trivial, there are non-trivial perturbations and variations of this example: for example, most of the roots being concentrated near the origin, although not all identical, and a few further roots distributed elsewhere in D . However, it seems plausible that this trivial case is the worst case, so one might conjecture that in fact Nd ("; r) = d j log "j= log r. For any xed polynomial with distinct roots, the necessary number of steps is in O(j log j log "jj) because very close to the roots the quadratic convergence takes over. This is the ideal theoretical complexity, but it is unrealistic in practice because usually the value of " is xed so that its asymptotics is useless.

References [A] [Ba] [BM] [HC] [JT] [KS]

Lars Ahlfors: Lectures on quasiconformal mappings. Van Nostrand publishers (1966) / Wadsworth&Brooks/Cole (1987).  die Divergenzpunkte des Newtonschen Verfahrens zur BesB. Barna: Uber timmung von Wurzeln algebraischer Gleichungen. Publ. Math. Debrecen 14 (1956), 384{397. Jean-Beno^t Bost, Jean-Francois Mestre: Moyenne Arithmetico-geometrique et Periodes des Courbes de Genre 1 et 2. Preprint, LMENS-88-13, Ecole Normale Superieure (1988). Adolf Hurwitz, Richard Courant, Funktionentheorie (4. Au age). Springer, Berlin 1964. William B. Jones, Wolfgang H. Thron: Continued fractions (analytic theory and applications). Addison-Wesley (1980). Myong-He Kim and Scott Sutherland: Polynomial root- nding algorithms and branched covers. SIAM Journal of Computing 23 (1994), 415{436.

Newton's Method for Polynomials

31

[Kr] Hartje Kriete: On the eciency of relaxed Newton's method. In: Pitman research notes in Mathematics Series 305 (1994), 200{212. [M] Anthony Manning: How to be sure of nding a root of a complex polynomial using Newton's method. Boletim da Sociedade Brasileira de Matematica, Vol. 22, No. 2 (1992), 157{177. [Ma] Morris Marden: The geometry of the zeros of a polynomial in a complex variable. Mathematical Surveys III, American Mathematical Society (1949). [Mi] John Milnor: Iteration in one complex variable: introductory lectures. Institute for Mathematical Sciences at Stony Brook, Preprint # 5 (1991). [NP] Rolf Nevanlinna, Veikko Paatero: Einfuhrung in die Funktionentheorie. Birkhauser Verlag, Basel 1964. [Pr] Feliks Przytycki: Remarks on the simple connectedness of basins of sinks for iterations of rational maps. In: Dynamical Systems and Ergodic Theory, ed. K. Krzyzewski. Polish Scienti c Publishers, Warszawa (1989), 229{235. [Sh] Mitsuhiro Shishikura: Connectivity of the Julia set and xed point. Preprint, Institut des Hautes Etudes Scienti ques IHES/M/90/37 (1990). [Su] Scott Sutherland: Finding roots of complex polynomials with Newton's method. Thesis, Boston University (1989). John Hubbard Department of Mathematics Cornell University White Hall Ithaca, NY 14853-7901, USA [email protected] Scott Sutherland Institute for Mathematical Sciences State University of New York Stony Brook, NY 11794-3660, USA [email protected]

Dierk Schleicher Fakultat fur Mathematik Technische Universitat Barer Strae 23 D-80290 Munchen, Germany [email protected]