Jul 12, 2011 - ical problems with polynomials using the Chebfun software sys- ... be in almost any practical application
faug
July 12, 2011
10:51
Page 1
Six Myths of Polynomial Interpolation and Quadrature Lloyd N. Trefethen FRS, FIMA, Oxford University
t is a pleasure to offer this essay for Mathematics Today as a On the face of it, this caution is justified by two theorems. record of my Summer Lecture on 29 June 2011 at the Royal Weierstrass proved in 1885 [1] that any continuous function can Society. be approximated arbitrarily closely by polynomials. On the other Polynomials are as basic a topic as any in mathematics, and hand, Faber proved in 1914 [2] that no polynomial interpolation for numerical mathematicians like me, they are the starting point scheme, no matter how the points are distributed, will converge for of numerical methods that in some cases go back centuries, like all such functions. So it sounds as if there is something wrong with polynomial inquadrature formulae for numerical integration and Newton iterations for finding roots. You would think that by now, the basic terpolation. Yet the truth is, polynomial interpolants in Chebyshev facts about computing with polynomials would be widely under- points always converge if f is a little bit smooth. (We shall call stood. In fact, the situation is almost the reverse. There are indeed them Chebyshev interpolants.) Lipschitz continuity is more than 1885 that enough, any continuous can ≤ beL|x approximated arbitrarily that is, |ffunction (x) − f (y)| − y| for some constant Lclosely and by polywidespread views about polynomials, but some of the important they other in 1914 that no polynomial interpolation all x, ∈ [−1,hand, 1]. SoFaber long proved as f is Lipschitz continuous, as it will ones are wrong, founded on misconceptions entrenched bynomials. gener- On scheme, nobematter howany thepractical points are distributed, for all such functions. in almost application, pn will → fconverge is guaranteed. ations of textbooks. So it sounds as ifisthere is something wrong polynomial interpolation. Yet the There indeed a big problem withwith convergence of polynomial Since 2006, my colleagues and I have been solving mathemattruth is, polynomial interpolants in Chebyshev points always convergepoints. if f is a little bit interpolants, but it pertains to interpolation in equispaced ical problems with polynomials using the Chebfun software syssmooth (weAsshall call them Chebyshev interpolants). Lipschitz continuity is more than Runge showed in 1901 [3], equispaced interpolants may diverge tem (www.maths.ox.ac.uk/chebfun). We have learned from enough, that is, |f (x) − f (y)| ≤ L|x − y| for some constant L and all x, y ∈ [−1, 1]. daily experience how fast and reliable polynomials are. This en- exponentially, even if f is so smooth as to be analytic (holomorSo long as f is Lipschitz continuous, as it will be in almost any practical application, phic). This genuinely important fact, known as the Runge phetirely positive record has made me curious to try to pinp down n → f is guaranteed. nomenon, With Faber’softheorem seeming to where these misconceptions come from, and this essay is anThere at- is indeed ahas bigconfused problempeople. with convergence polynomial interpolants, but justification, real problem with As equispaced polynomial tempt to summarise some of my findings. Full details, including it pertainsprovide to interpolation in aequispaced points. Runge showed in 1901, equisinterpolants has been overgeneralised so that people have suspected precise definitions and theorems, can be found in my draft book paced interpolants may diverge exponentially, even if f is so smooth as to be analytic to polynomial interpolants in general. Approximation Theory and Approximation Practice, available at it of applying (holomorphic). This genuinely important fact, known as the Runge phenomenon, has TheWith smoother f is, the faster its Chebyshev www.maths.ox.ac.uk/~trefethen. confused people. Faber’s theorem seeming to provide interpolants justification,cona real problem verge. polynomial If f has νinterpolants derivatives, has with the overgeneralized ν th derivative so being The essay is organised around ‘six myths’. Each myth has some with equispaced been thatofpeople have −ν suspected it of applying to polynomial in ngeneral. variation V , then kf interpolants − pn k = O(V ) as n → ∞. (By truth in it – mathematicians rarely say things that are simply false! bounded The smoother thethe faster its Chebyshev kf − pn kf I is, mean maximum of |f (x) −interpolants pn (x)| for x converge. ∈ [−1, 1].) If f has ν Yet each one misses something important. −ν ν th derivative being of bounded variation, then f is the analytic, the convergence is geometric, with kf −kf pn−p k n=k = O(n ) Throughout the discussion, f is a continuous function derivatives, defined Ifwith −n as n → ∞. (By kf − p k I mean the maximum of |f (x) − p (x)| for x ∈ [−1, 1].) If f n n parameter ρ ρ > 1. I will give details about the on the interval [−1, 1], n + 1 distinct points x0 , . . . , xn in [−1, 1] O(ρ ) for some −n is analytic, the convergence is geometric, with kf − p k = O(ρ ) for some ρ > 1. I n are given, and pn is the unique polynomial of degree at most n with under Myth 5. will give details about the parameter ρ under Myth 5. For example, here is the degree 10,000 Chebyshev interpolant pn (xj ) = f (xj ) for each j. Two families of points will be of particFor example, here is the degree 10,000 Chebyshev interpolant pn to the sawtooth −1 ular interest: equispaced points, xj = −1 + 2j/n, and Chebyshev pn to the sawtooth function f (x) defined as the integral from function f (x) defined as the integral from −1 to x of sign(sin(100t/(2 − t))). This to x of sign(sin(100t/(2 − t))). This curve may not look like a points, xj = cos(jπ/n). I will also mention Legendre points, decurve may not look like a polynomial, but it is! With kf − pn k ≈ 0.0001, the plot is it is! kf − pn k ≈ 0.0001, the plot is indisfined as the zeros of the degree n + 1 Legendre polynomialindistinguishable Pn+1 . polynomial, from but a plot of With f itself. The discussion of each myth begins with two or three represen- tinguishable from a plot of f itself. tative quotations from leading textbooks, listed anonymously with the year of publication. Then I say a word about the mathematical truth underlying the myth, and after that, what that truth overlooks.
I
Myth 1. Polynomial interpolants diverge as n → ∞ Textbooks regularly warn students not to expect pn → f as n → ∞. ‘Polynomial interpolants rarely converge to a general There is not muchisuse polynomial interpolantsinterpolants to functionstowith so little smoothThere notinmuch use in polynomial functions continuous function.’ (1989) ness as this, butsomathematically they are trouble-free. For smoother like ex , with little smoothness as this, but mathematically they arefunctions troucos(10x) orble 1/(1 + For 25x2smoother ), we getfunctions convergence digits oforaccuracy for2 ),small values free. like to ex ,15 cos(10x) 1/(1+25x ‘Unfortunately, there are functions for which interof n (14, 34 and 182, respectively). we get convergence to 15 digits of accuracy for small values of n polation at the Chebyshev points fails to converge.’ (14, 34 and 182, respectively). (1996)
Myth 2. Evaluating polynomial interpolants numerically is problematic Interpolants in Chebyshev points may converge in theory, but aren’t there problems on a computer? Textbooks warn students about this. “Although Lagrangian interpolation is sometimes useful in theoretical investigations, it is rarely used in practical computations.” (1985)
faug
July 12, 2011
10:51
Page 2
Myth 2. Evaluating polynomial interpolants numerically is problematic
Myth 3. Best approximations are optimal This one sounds true by definition!
Interpolants in Chebyshev points may converge in theory, but aren’t there problems on a computer? Textbooks warn students about this. ‘Polynomial interpolation has drawbacks in addition to those of global convergence. The determination and evaluation of interpolating polynomials of high degree can be too time-consuming and can also lead to difficulty problems associated with roundoff error.’ (1977)
‘Since the Remes algorithm, or indeed any other algorithm for producing genuine best approximations, requires rather extensive computations, some interest attaches to other more convenient procedures to give good, if not optimal, polynomial approximations.’ (1968)
‘Although Lagrangian interpolation is sometimes useful in theoretical investigations, it is rarely used in practical computations.’ (1985)
‘Minimal polynomial approximations are clearly suitable for use in functions evaluation routines, where it is advantageous to use as few terms as possible in an approximation.’ (1968)
‘Interpolation is a notoriously tricky problem from the point of view of numerical stability.’ (1990)
‘Ideally, we would want a best uniform approximation.’ (1980)
The origin of this view is the fact that some of the methods one Though the statement of this myth looks like a tautology, there is might naturally try for evaluating polynomial interpolants are slow content in it. The ‘best approximation’ is a common name for the or numerically unstable or both. For example, if you write down unique polynomial p∗n that minimises kf − p∗n k. So a best approxthe Lagrange interpolation formula in its most obvious form and imant is optimal in the maximum norm, but is it really the best in implement on a computer as problem written, you algorithm thatof numerical practice? “Interpolation is a itnotoriously tricky fromget theanpoint of view requires O(n2 ) work per evaluation point. (Partly because of this, stability.” (1990) As the first quotation suggests, computing p∗n is not a trivbooks warnview readers thatfact theythat should useof Newton interpolation for- naturally The origin of this is the some the methods one might ial matter, since the dependence on f is nonlinear. By contrast, mulae rather than Lagrange – another myth.) if you compute try for evaluating polynomial interpolants are slow or And numerically unstable computing or both. a Chebyshev interpolant with the barycentric formula is ‘linear style’, byinterpolation setting up a Vandermonde maFor example,interpolants if you write downalgebra the Lagrange formula in its mosteasy. obvious Here our prejudices about value for money begin to intrude. n trix whoseitcolumns contain samples of 1,you x, x2get , . . an . , xalgorithm at the grid form and implement on a computer as written, thatIfrequires best approximations are hard to compute, they must be valuable! O(n2 ) work points, per evaluation point. (Partly of this, books warnThis readers theyconsiderations make the truth not so simple. First of all, your numerical methodbecause is exponentially unstable. is thatTwo should use Newton interpolationalgorithm formulaeof rather than — of another And the ‘polyval/polyfit’ Matlab, andLagrange I am guilty propakf − p∗nmyth.) k. the So amaximum-norm best approximant is optimal in thebetween maximum norm, but is it really the accuracy difference Chebyshev interif you compute interpolants “linear style”, by setting a Vandermonde matrix best in practice? gating Myth 2 myself by algebra using this algorithm in my up textbook Specpolants and best approximations can never be large, for Ehlich and whose columns ∗ n a trivial As thenumerical first quotation suggests, computing pcannot contain samples 1, x, x2 , . . . , x at on thea computer grid points, your tral Methods in Matlabof[4]. Rounding errors destroy n is not Zeller proved in 1966 [7] that kf − p k exceed kf − matter, p∗n k by since the n
method on fthan is nonlinear. By contrast, computing a Chebyshev interpolant with is exponentially This is the Matlab, all accuracy ofunstable. this method even for n“polyval/polyfit” = 60, let alone n algorithm = dependence 10,000 ofmore the factor 2+(2/π) log(n+1). Usually the difference is the in barycentric formula is easy. Here our prejudices about value for money begin to and I am guilty my textbook as inof thepropagating plot above. Myth 2 myself by using this algorithm less than that, and in fact, the best known error bounds for functions intrude. If best approximations hard to compute, they must be valuable! Spectral Methods Matlab. errors on isa acomputer all accuracy of ν derivativesare Orin how about nRounding = 1,000,000? Here plot of thedestroy polynomial that have or are not analytic, the two smoothness classes Two considerations make the truth simple. First of all, the maximum-norm this methodinterpolant even for n to = f60, n = 10,000 as in the plot above. (x)let=alone sin(10/x) in a million Chebyshev points. mentioned under Myth 1, are just asofactor of 2 larger for Chebyshev accuracy difference Or how The about = 1,000,000? Here30 is seconds a plot of polynomial interpolant tobetween Chebyshev interpolants and best approximations can never plotnwas obtained in about on the my laptop by evalu∗ interpolants as for best approximations. be large, Ehlich f (x) = sin(10/x) in interpolant a million Chebyshev points. The near plot zero. was obtained in for about 30 and Zeller proved in 1966 that kf − pn k cannot exceed kf − pn k ating the at 2,000 points clustered Secondly, according to the equioscillation theorem going back by more than seconds on my laptop by evaluating the interpolant at 2000 points clustered near the zero.factor 2 + (2/π) log(n + 1). Usually the difference is less than that, to the Chebyshev in the 1850s, the maximum error of ahave bestνapproxand in fact, best known error bounds for functions that derivatives or are imation is always attained at least n under + 2 points analytic, the two smoothness classesatmentioned Mythin1,[−1, are 1]. justFor a factor of 2 larger for Chebyshev interpolants for best approximations. example, the black curveasbelow is the error curve f (x) − p∗n (x), Secondly, to the equioscillation theorem to Chebyshev in the x ∈according [−1, 1], for the best approximant to |x −going 1/4| back of degree 100, 1850s, the equioscillating maximum errorbetween of a best approximation is always attained atred at least n + 2 102 extreme values ≈ ±0.0027. The points in [−1, 1]. corresponds For example,tothe curve below is the error f (x)−p∗n (x), x ∈ curve theblack polynomial interpolant pn incurve Chebyshev [−1, 1], forpoints. the best approximant tomost |x − 1/4| of of degree between 102 It is clear that for values x, |f 100, (x) −equioscillating pn (x)| is much extreme values ≈ ±0.0027. The red ∗ curve corresponds to the polynomial interpolant pn smaller than |f (x) − pn (x)|. Which approximation would be more in Chebyshev points. It is clear that for most values of x, |f (x)− pn (x)| is much smaller useful in an application? I think the only reasonable answer is, it than |f (x) − p∗n (x)|. Which approximation would be more useful in an application? depends. Sometimes one really does need a guarantee about worstThe fast and stable algorithm that makes these calculations possible comes from reasonable a I think the only answer is, it depends. Sometimes one really does need case behaviour. In other situations, wouldsituations, be wasteful sacrifice The and stable algorithm that as makes these calculations representation of thefast Lagrange interpolant known the barycentric published aformula, guarantee about worst-case behaviour. In itother it to would be wasteful to so much accuracy over 95% of the range just to gain one bit of the Lagrange interpolant by Salzer inpossible 1972: comes from a representation sacrifice so much accuracy over 95% of the range just to gain one bit of of acaccuracy in a , n formula, n curacy in a small subinterval. j known as the barycentric by Salzer in 1972 [5]: X X small subinterval. 0 (−1)j fjpublished 0 (−1) pn (x) = , x − xj j ,j=0 n x − xjj n X X j=0 0 (−1) fj 0 (−1) pn (x) = , x xon−the xj summation signs signify with the special case pn (x) = fj j=0 if x =xx− . The primes j j j=0 that the terms j = 0 and j = n are multiplied by 1/2. The work required is O(n) per evaluation point, andspecial though thepndivisions look dangerous with the case (x) = fjbyif xx−=xjxjmay . The primes on the for x ≈ xj , the formula summation is numerically wastheproved in 2004. signsstable, signifyasthat terms by j =Nick 0 andHigham j = n are multiplied by 1/2. The work required is O(n) per evaluation point, and the divisions by x − xj may look dangerous for x ≈ xj , Myth 3.though Best approximations are optimal the formula is numerically stable, as was proved by Nick Higham in 2004 [6]. This one sounds true by definition! “Since the Remes algorithm, or indeed any other algorithm for producing genuine best approximations, requires rather extensive computations, some interest attaches to Mythapproxima4. Gauss quadrature has twice the order of other more convenient procedures to give good, if not optimal, polynomial tions.” (1968) accuracy of Clenshaw–Curtis “Ideally, we would want a best uniform approximation.” (1980) Though the statement of this myth looks like a tautology, there isQuadrature content in it. The are usually derived from polynomials. We approximate an integral formulae “best approximation” is a common name for the unique polynomialby p∗nathat minimizes finite sum,
I=
Z
1
f (x)dx
≈
In =
n X
wk f (xk ),
(1)
faug
July 12, 2011
10:51
Page 3
Myth 4. Gauss quadrature has twice the order of accuracy of Clenshaw–Curtis Quadrature formulae are usually derived from polynomials. We approximate an integral by a finite sum, Z 1 n X I= f (x)dx ≈ In = wk f (xk ), (1) −1
k=0
and the weights {wk } are determined by the principle of interpolating f by a polynomial at the points {xk } and integrating the interpolant. Newton–Cotes quadrature corresponds to equispaced points, Clenshaw–Curtis quadrature to Chebyshev points, and Gauss quadrature to Legendre points. Almost every textbook first describes Newton–Cotes, which achieves In = I exactly if f is a polynomial of degree n, and then shows that Gauss has twice this order of exactness: In = I if f is a polynomial of degree 2n + 1. Clenshaw–Curtis is occasionally mentioned, but it only has order of exactness n, no better than Newton–Cotes.
A theorem of mine from 2008 makes this observation precise. If f has a ν th derivative of bounded variation V , Gauss quadrature can be shown to converge at the rate O(V (2n)−ν ), the factor of 2 reflecting its doubled order of exactness. The theorem asserts that the same rate O(V (2n)−ν ), with the same factor of 2, is also achieved by Clenshaw–Curtis. (Folkmar Bornemann (private communication) has pointed out that both of these rates can probably be improved by one further power of n.) The explanation for this surprising result goes back to O’Hara and Smith in 1968 [13]. It is true that (n + 1)-point Gauss quadrature integrates the Chebyshev polynomials Tn+1 , Tn+2 , . . . exactly whereas Clenshaw–Curtis does not. However, the error that Clenshaw–Curtis makes consists of aliasing them to the Chebyshev polynomials Tn−1 , Tn−2 , . . . and integrating these correctly. As it happens, the integrals of Tn+k and Tn−k differ by only O(n−3 ), and that is why Clenshaw–Curtis is more accurate than its order of exactness seems to suggest.
Myth 5. Gauss quadrature is optimal
‘However, the degree of accuracy for Clenshaw– Curtis quadrature is only n − 1.’ (1997)
Gauss quadrature may not be much better than Clenshaw–Curtis, but at least it would appear to be as accurate as possible, the gold standard of quadrature formulae.
‘Clenshaw–Curtis rules are not optimal in that the degree of an n-point rule is only n − 1, which is well below the maximum possible.’ (2002)
‘The precision is maximised when the quadrature is This ubiquitous emphasis on order of exactness is misleading. Gaussian.’ (1982) Gauss quadrature Legendre points. Almost Gauss every textbook first gives describes Newton– Textbookstosuggest that the reason quadrature better ‘In fact, it can be shown that among all rules using Cotes, which In = I exactly is if fitsishigher a polynomial of exactness, degree n, and resultsachieves than Newton–Cotes order of butthen thisshows that Gauss has twice this order of accuracy: In = I if f is a polynomial of degree n function evaluations, the n-point Gaussian rule is is not correct. The problem with Newton–Cotes is that the sample 2n + 1. Clenshaw–Curtis is occasionally mentioned, but it only has order of exactness likely to produce the most accurate estimate.’ (1989) points are equally spaced: the Runge phenomenon. In fact, as Pólya n, no better than Newton–Cotes. provedthe in 1933 Newton–Cotes quadrature does not converge “However, degree[8], of accuracy for Clenshaw–Curtis quadrature is only as n − 1.” There is another misconception here, quite different from the one
(1997) n → ∞, in general, even if f is analytic. just discussed as Myth 4. Gauss quadrature is optimal as measured “Clenshaw–Curtis rules are and not optimal in that the degree of an n-pointdifferrule is only Clenshaw–Curtis Gauss quadratures behave entirely by polynomial order of exactness, but that is a skewed measure. n − 1, which is well below the maximum possible.” (2002) ently. Both schemes converge for all continuous integrands, and if This ubiquitous emphasis on order of exactness is misleading. Textbooks suggest The power of polynomials is nonuniform: they have greater resintegrand analytic,gives the convergence geometric. Clenshaw– that thethe reason Gauss is quadrature better resultsisthan Newton–Cotes is its higher olution near the ends of an interval than in the middle. Suppose, Curtis is easy to implement, using either the Fast Fourier Transform order of exactness, but this is not correct. The problem with Newton–Cotes is that the for example, that f (x) is an analytic function on [−1, 1], which sample or points arealgorithm equally spaced: the Runge fact,reason as P´ olya proved in by an of Waldvogel inphenomenon. 2006 [9], andInone it gets means it can be analytically continued into a neighbourhood of 1933, Newton–Cotes quadrature does Gauss not converge as n → ∞,ainbig general, is little attention may be that quadrature had head even start,if f [−1, 1] in the complex x-plane. Then polynomial approximants analytic. invented in 1814 [10] instead of 1960 [11]. Both Clenshaw–Curtis to f , whether Chebyshev or Legendre interpolants or best approxBoth Clenshaw–Curtis and Gauss quadrature behave entirely differently. Both Gaussfor quadrature are practical evenand if nifisthe in the millions, in the the schemesand converge all continuous integrands, integrand is analytic, imants, will converge at a geometric rate O(ρ−n ) determined by latteriscase because nodes and weights can calculated by either an algoconvergence geometric. Clenshaw–Curtis is easy to be implement, using the Fast how close any singularities of f in the plane are to [−1, 1]. To be Fourier rithm Transform or by an algorithm of Waldvogel in 2006, reasoninit gets implemented in Chebfun due to Glaser, Liu and andone Rokhlin precise, ρ is the sum of the semiminor and semimajor axis lengths little attention may be that Gauss quadrature had a big head start, invented in 1814 2007 [12]. (Indeed, since Glaser–Liu–Rokhlin, it is another myth to of the largest ellipse with foci ±1 inside which f is analytic and instead of 1960. Both Clenshaw–Curtis and Gauss quadrature are practical even if n Gauss only practicable for be small values by bounded. is in theimagine millions,that in the latterquadrature case becauseisnodes and weights can calculated an But ellipses are narrower near the ends than in the midof n.) algorithm implemented in Chebfun due to Glaser, Liu and Rokhlin in 2007. (Indeed, dle. If f has a singularity at x0 = iε for some small ε, then we since Glaser–Liu–Rokhlin, it isorder another to imagine Gauss quadrature With twice the ofmyth exactness, we that would expect Gaussis only get O(ρ−n ) convergence with ρ ≈ 1 + ε. If f has a singularity at √ practicable for smallto values of n.) twice as fast as Clenshaw–Curtis. Yet it quadrature converge 2ε, cor1 + ε, on the other hand, the parameter becomes ρ ≈ 1 + Withdoes twice theUnless order off exactness, expect Gauss quadrature to converge not. is analyticweinwould a large region surrounding [−1, 1] responding to much faster convergence. A function with a singutwice as fast as Clenshaw–Curtis. Yet it does not. Unless f is analytic in a large region in the complex plane, one typically finds that Clenshaw–Curtis surrounding [−1, 1] in the complex plane, one typically finds that Clenshaw–Curtis larity at 1.01 converges 14 times faster than one with a singularity quadrature converges rate asasGauss, as illustrated quadrature converges at about at theabout samethe ratesame as Gauss, illustrated by these curves at 0.01i. 2 for f (x)by = these exp(−1/x ): for f (x) = exp(−1/x2 ): curves Quadrature rules generated by polynomials, including both 0
10
−5
error
10
Clenshaw−Curtis −10
10
Gauss −15
10
0
10
20
30
40
50
degree n
60
70
80
90
100
Gauss and Clenshaw–Curtis, show the same nonuniformity. This might seem unavoidable, but in fact, there is no reason why a quadrature formula (1) needs to be derived from polynomials. By introducing a change of variables, one can generate alternative formulae based on interpolation by transplanted polynomials, which may converge up to π/2 times faster than Gauss or Clenshaw– Curtis quadrature for many functions. This idea was developed in a paper of mine with Nick Hale in 2008 [14] and is related to earlier work by Kosloff and Tal-Ezer in 1993 [15].
A theorem of mine from 2008 makes this observation precise. If f has a ν th derivative of bounded variation, Gauss quadrature can be shown to converge at the rate O(n−2ν ), the factor of 2 reflecting its doubled order of accuracy. The theorem asserts that the same rate O(n−2ν ), with the same factor of 2, is also achieved by Clenshaw– Curtis. The explanation for this surprising result goes back to O’Hara and Smith in 1968. It is true that (n + 1)-point Gauss quadrature integrates the Chebyshev polynomi5
faug
July 12, 2011
10:51
Page 4
The following theorem applies to one of the transformed methods Hale and I proposed. Let f be a function that is analytic in an εneighborhood of [−1, 1] for ε ≤ 0.05. Then whereas Gauss quadrature converges at the rate In − I = O((1 + ε)−2n ), transformed Gauss quadrature converges 50% faster, In − I = O((1 + ε)−3n ). Here is an illustration for f (x) = 1/(1 + 25x2 ).
they can be computed accurately by solving an eigenvalue problem involving a ‘colleague matrix’. The details were worked out by Specht in 1957 [18] and Good in 1961 [19]. Chebfun finds roots of a function f on [−1, 1] by approximating it by a polynomial expressed in Chebyshev form and then solving a colleague-matrix eigenvalue problem, and if the degree is greater than 100, first subdividing the interval recursively to reduce it. These ideas originate with John Boyd in 2002 [20] and 10 are extraordinarily effective. Far from being exceptionally trouble10 some, polynomial root-finding when posed in this fashion begins Gauss to emerge as the most tractable of all root-finding problems, for we 10 can solve the problem globally with just O(n2 ) work to get all the transformed Gauss 10 roots in an interval to high accuracy. Far100from beingFor exceptionally troublesome, polynomial rootfinding when in this example, the function f (x) = sin(1000πx) on [−1, 1] posed is 0 10 20 30 40 50 60 70 80 90 degree n fashion begins to emerge as the most tractable of all rootfinding problems, for we can represented in Chebfun by a polynomial of degree 4091. It takes 2 solve the problem globally with just O(n ) work to get all the roots in an interval to 2 seconds on my laptop to find all 2001 of its roots in [−1, 1], and The fact thatfact some quadrature formulae converge up to converge π/2 times faster Gauss The that some quadrature formulae uphigh tothan π/2 accuracy. −16 the maximum deviation from the exact values is 4.4 × 10 . as n → ∞ is probably not of much practical importance. The importance is conceptual. times faster than Gauss as n → ∞ is probably not of much practical For example, the function f (x) = sin(1000πx) on [−1, 1] is represented in Chebfun Here is another illustration of the robustness of polynomial by a polynomial of degree 4091. It takes 2 seconds on my laptop to find all 2001 of its importance. The importance is conceptual. root-finding on an interval. In Chebfun, we have plotted the funcMyth 6. Polynomial rootfinding is dangerous roots in [−1, 1], and the maximum deviation from the exact values is 4.4 × 10−16 . tion f (x) = exp(x/2)(sin(5x) + sin(101x)) and then executedon an interHere is another illustration of the robustness of polynomial rootfinding Our final myth with Jim Wilkinson (1919–1986), hero of mine who taught Mythoriginates 6. Polynomial root-finding isadangerous the commands r = roots(f-round(f)), plot(r,f(r),’.’). val.on In we have plotted the function f (x) = exp(x/2)(sin(5x) + sin(101x)) me two courses in graduate school at Stanford. Working with Alan Turing the Chebfun, Pilot This sequence solves a collection of hundreds of polynomial rootand thensome executed the commands r = roots(f-round(f)), plot(r,f(r),’.’). This Our finalinmyth with that Jim Wilkinson hero Ace computer 1950, originates Wilkinson found attempts to (1919–1986), compute roots ofa even finding problems to locate all the points where f takes a value equal low-degree polynomials failed dramatically. He publicized this discovery widely. sequence solves a collection of hundreds of polynomial rootfinding problems to locate of mine who taught me two courses in graduate school at Stanto an integer or a half-integer, andtoplots them asand dots.plots The them compu“Our main object in this chapter has been to focus attention on the severe theinherent points where f takes a value equal an integer, as dots. The ford. Working with Alan Turing on the Pilot Ace computerall in 1950, limitations of all numerical methods for finding the zeros of polynomials.” (1963) tation took 2/3 of a second. computation took 2/3 of a second. Wilkinson found that attempts to compute roots of even some low0
error
−5
−10
−15
“Beware: Some polynomials are ill-conditioned!” (1992) degree failed dramatically. He publicised this discovThe first ofpolynomials these quotations comes from Wilkinson’s book on rounding errors, and he also coined the memorable phrase “the perfidious polynomial” as the title of a 1984 ery widely. article that won the Chauvenet Prize for outstanding mathematical exposition. What Wilkinson discovered the chapter extreme has ill-conditioning of roots ‘Our main object was in this been to focus at- of certain polynomials as functions of their coefficients. Specifically, suppose a polynomial pn is tention on the severe inherent limitations of all nu specified by its coefficients in the form a0 + a1 x + · · · + an xn . If pn has roots near the methods finding zeros of polynomials.’
unit circle inmerical the complex plane,for these pose the no difficulties: they are well-conditioned (1963) functions of the coefficients ak and can be computed accurately by Matlab’s “roots” command, based on the calculation of eigenvalues of a companion matrix containing ‘Beware: Somethepolynomials the coefficients. Roots far from circle, however,are suchill-conditioned!’ as roots on the interval [−1, 1], (1992) can be so ill-conditioned as to be effectively uncomputable. The monomials xk form exponentially bad bases for polynomials on [−1, 1]. The of argument these quotations comes fromabout Wilkinson’s book on of The flawfirst in the is that it says nothing the condition of roots polynomials as functions of their effective [−1, 1] ‘the based on rounding errors [16], andvalues. he alsoFor coined the rootfinding memorableonphrase pointwise samples, all one must do is fix the basis: replace the monomials xk , which are perfidious polynomial’ as the title of a 1984 article that won the orthogonal polynomials on the unit circle, by the Chebyshev polynomials Tk (x), which Chauvenet Prize for outstanding mathematical are orthogonal on the interval. Suppose a polynomial pn is exposition specified by [17]. its coefficients What Wilkinson thepnextreme in the form a0 T0 (x) + a1 T1 (x)discovered + · · · + an Tnwas (x). If has rootsill-conditioning near [−1, 1], these are well-conditioned the coefficientsasakfunctions , and they of cantheir be computed accurately of roots offunctions certainofpolynomials coefficients. Conclusion by solving an eigenvalue problem involving a a “colleague matrix”. The details were Specifically, suppose a polynomial pn is specified by its coefficients worked out by Specht in 1957 and Good in 1961. Perhaps I might close by mentioning another perspective on the in the finds formroots a0 +ofaa1function x + · · ·f+onan[−1, xn1] . If the unit I might Perhaps close by mentioning another perspective on the misconceptions that n has roots near Chebfun bypapproximating it by a polynomial misconceptions that have affected the study of computation withof variables circle the complex these posea no difficulties: are wellhave affected the study of computation with polynomials. By the change expressed in in Chebyshev form plane, and then solving colleague matrix they eigenvalue problem, and ifconditioned the degree is greater thanof 100, subdividingathe interval reduce x = tocos θ, polynomials. one can show interpolation by polynomials in Chebyshev Bythat the change of variables x = cos θ, one can show points is functions thefirst coefficients can recursively be computed k and it. These ideas originate with John Boyd in 2002 and are extraordinarily effective. equivalent to interpolation of periodic functions by series of sines and cosines in equisthat interpolation by polynomials in Chebyshev points is equivaaccurately by Matlab’s ‘roots’ command, based on the calculation
Conclusion
paced points. latter is the subject of discrete Fourier analysis, and one lentThe to interpolation of periodic functions by series of sines andcannot help of eigenvalues of a companion matrix containing the coefficients. noting that whereas there is widespread suspicion that it is not safe to compute with cosines in equispaced points. The latter is the subject of discrete Roots far from the circle, however, such as roots on the interval 7 polynomials, nobody worriesand about the Fast Fourier Inthere the end Fourier analysis, one cannot help notingTransform! that whereas is this may [−1, 1], can be so ill-conditioned as to be effectively uncomputable. be the biggest difference between Fourier and the difference widespread suspicion that it is not safepolynomial to computeinterpolants, with polynomiThe monomials xk form exponentially bad bases for polynomials in their reputations. als, nobody worries about the Fast Fourier Transform! In the end on [−1, 1]. bonus, free of charge. this amay be the biggest difference between Fourier and polynomial The flaw in the argument is that it says nothing about theAnd con- here’s dition of roots of polynomials as functions of their values. For interpolants, the difference in their reputations. here’s a bonus, free of charge. Lagrange interpolation effective root-finding on [−1, 1] based on pointwise samples, all 7. And Myth Lagrange discovered one must do is fix the basis: replace the monomials xk , which It was Waring, in Myth the Philosophical Transactions of theLagrange Royal Society in 1779. Euler are orthogonal polynomials on the unit circle, by the Chebyshev 7. Lagrange discovered usedSupthe formula in 1783, and Lagrange in 1795. polynomials Tk (x), which are orthogonal on the interval. interpolation pose a polynomial pn is specified by its coefficients in the form a0 T0 (x) + a1 T1 (x) + · · · + an Tn (x). If pn has roots near [−1, 1], It was Waring in 1779 [21]. Euler used the formula in 1783, and these are well-conditioned functions of the coefficients ak , and Lagrange in 1795. h 8
faug
July 12, 2011
10:51
Page 5
References 1 Weierstrass, K. (1885) Über die analytische Darstellbarkeit sogenannter willkürlicher Funktionen einer reellen Veränderlichen, Sitzungsberichte der Akademie zu Berlin, pp. 633–639 and 789–805. 2 Faber, G. (1914) Über die interpolatorische Darstellung stetiger Funktionen, Jahresber. Deutsch. Math. Verein., vol. 23, pp. 190–210. 3 Runge, C. (1901) Über empirische Funktionen und die Interpolation zwischen äquidistanten Ordinaten, Z. Math. Phys., vol. 46, pp. 224– 243. 4 Trefethen, L.N. (2000) Spectral Methods in MATLAB, SIAM, Philadelphia. 5 Salzer, H.E. (1972) Lagrangian interpolation at the Chebyshev points xn,v ≡ cos(vπ/n), v = 0(1)n; some unnoted advantages, Comp. J. vol. 15, pp. 156–159. 6 Higham, N.J. (2004) The numerical stability of barycentric Lagrange interpolation IMA J. Numer. Anal., vol. 24, no. 4, pp. 547–556. 7 Ehlich, H. and Zeller, K. (1966) Auswertung der Normen von Interpolationsoperatoren, Math. Ann., vol. 164, pp. 105–112. 8 Pólya, G. (1933) Über die Konvergenz von Quadraturverfahren, Math. Z., vol. 37, pp. 264–286. 9 Waldvogel, J. (2006) Fast construction of the Fejér and Clenshaw– Curtis quadrature rules, BIT Num. Math., vol. 46, no. 1. pp. 195–202. 10 Gauss, C. F. (1814) Methodus nova integralium valores per approximationem inveniendi, Comment. Soc. R. Scient. Göttingensis Rec., vol. 3, pp. 39–76.
11 Clenshaw, C. W. and Curtis, A. R. (1960) A method for numerical integration on an automatic computer, Numer. Math., vol. 2, pp. 197–205. 12 Glaser, A., Liu, X. and Rokhlin, V. (2007) A fast algorithm for the calculation of the roots of special functions, SIAM J. Sci. Comp., vol. 29, no. 4, pp. 1420–1438. 13 O’Hara, H. and Smith, F. J. (1968) Error estimation in the Clenshaw– Curtis quadrature formula, Comp. J., vol. 11, pp. 213–219. 14 Hale, N. and Trefethen, L. N. (2008) New quadrature formulas from conformal maps, SIAM J. Numer. Anal., vol. 46, pp. 930–948. 15 Kosloff, D. and Tal-Ezer, H. (1993) A modified Chebyshev pseudospectral method with an O(N 1 ) time step restriction, J. Comp. Phys., vol. 104, pp. 457–469. 16 Wilkinson, J.H. (1994) Rounding Errors in Algebraic Processes (Prentice-Hall Series in Automatic Computation) Dover Publications. 17 Wilkinson, J.H. (1984) The perfidious polynomial, MAA Stud. Num. Anal. 18 Specht, W. (1957) Die Lage der Nullstellen eines Polynoms. III, Math. Nachr., vol. 16, pp. 363–389. 19 Good, I. J. (1961) The colleague matrix, a Chebyshev analogue of the companion matrix, Quart. J. Math., vol. 12, pp. 61–68. 20 Boyd, J.P. (2002) Computing zeros on a real interval through Chebyshev expansion and polynomial rootfinding, SIAM J. Numer. Anal., vol. 40, no. 5, pp. 1666–1682. 21 Waring, E. (1779) Problems concerning interpolations, Phil. Trans., vol. 69, pp. 59–67.
Urban Maths: Virtual Unreality
A. Townie riving through an unfamiliar city centre recently, with its poorly signposted routes and confusing one-way systems, I became increasingly frustrated and completely lost. Sometimes I went with the flow and took the direction being followed by the majority of the traffic; at other times I deliberately avoided such directions. I began to feel like a particle randomly diffusing through an incomprehensible network of roads! When I eventually reached my destination and regained my equilibrium I realised the process I’d followed through the streets reminded me of a technique for solving linear networks that I’ve only come across in the relatively recent past.
D
q
Imagine an electrical circuit comprising a number of interconnected electrical resistors. Focus on a particular interconnection node and the nodes and resistors to which it is directly connected. For example, let’s look at a segment of the circuit where node, m, say, might be surrounded by nodes, q, r and s, to which it is connected via resistors R1 , R2 and R3 , as in Figure 1. We start conventionally by analysing this part of the circuit using Kirchhoff’s current law and Ohm’s law. Kirchhoff’s current law says that the sum of the electrical currents out of any interconnection node must be zero (this is basically a statement of the conservation of electric charge). So, with currents i1 , i2 and i3 from Figure 1 we have:
r R2
R1
i1
i2
i1 + i2 + i3 = 0.
(1)
Ohm’s law relates the voltage difference across a resistor to the current flowing through it and the value of the resistance. Using Vx to represent the voltage at node x, we have for the three branches of Figure 1:
m
Vm − Vq = i1 R1 ,
Vm − Vr = i2 R2 ,
i3 R3
Vm − Vs = i3 R3 .
s
Figure 1: Segment of electrical network
(2)
Divide each of the equations in (2) by its value of resistance, sum the resulting equations and make use of equation (1) to obtain, after