Useful Computations Need Useful Numbers - CiteSeerX

0 downloads 0 Views 234KB Size Report
Most of us have taken the exact rational and approximate numbers in our computer ... (Series are more useful if subsequent operations distribute over sums, ..... number of digits of both endpoints for which the two numbers differ when rounded ...
ACM Communications in Computer Algebra, Vol 41, No. 3, September 2007

Formally Reviewed Communication

Useful Computations Need Useful Numbers David R. Stoutemyer Stoutemyer Consultants LLC Contact: [email protected] Abstract Most of us have taken the exact rational and approximate numbers in our computer algebra systems for granted for a long time, not thinking to ask if they could be significantly better. With exact rational arithmetic and adjustable-precision floating-point arithmetic to precision limited only by the total computer memory or our patience, what more could we want for such numbers? It turns out that there is much more that can be done that permits us to obtain exact results more often, more intelligible results, approximate results guaranteed to have requested error bounds, and recovery of exact results from approximate ones.

1

Introduction

Computer algebra systems now incorporate extremely advanced mathematics, some of which is of interest to only a small portion of customers. However, in our race to add increasingly sophisticated capabilities, we have overlooked opportunities to substantially improve one of the most fundamental features — the ground domains of exact rational and approximate numbers. Most computer algebra systems offer exact rational arithmetic using however much storage is necessary, together with approximate arithmetic that is usually floating-point. Moreover, the precision of the approximate arithmetic is often adjustable up to an amount that exhausts either memory or the user’s patience. Some computer algebra systems also offer an add-on approximate interval arithmetic package, perhaps also with adjustable precision. This seems so much compared to the limited floating-point and integer arithmetic of typical numeric software that it is natural not to think about how the presentation and usefulness of computer algebra numbers could be improved. However, reconsideration of what we really want from computer algebra suggests easy ways to significantly improve the comprehensibility, power and veracity of these basic numeric ground domains. This paper begins with a discussion of the goals of computer algebra, then suggests several ways that the treatment of exact rational and approximate numbers in computer algebra systems can be integrated to better meet those goals, including: a) A more automatic and seamless combination of exact and approximate arithmetic. b) A more automatic and seamless use of adaptive precision interval arithmetic and self-validating algorithms. c) A more intelligible and informative adaptive display of intervals and exact numbers in expressions. 75

Vol 41, No. 3, September 2007

Formally Reviewed Communication

d) The use of infinitesimally-perturbed numbers and multi-interval arithmetic, together with generalized limits. e) The optional recovery of exact numbers from approximate ones. Some of these ideas are partially present in some computer algebra systems. However, there is substantial extra benefit if all or most of the ideas are fully implemented together using an interface that automates their use. The purpose of this article is to motivate these improvements, suggest them as research ideas for student or post-graduate computer algebra research, and suggest how they might be implemented.

2 Important Goals for Computer Algebra Systems “The goal of computing is insight, not numbers” — R. W. Hamming Richard Hamming was probably thinking primarily of numerical methods, but for computer algebra we can append the corollary: “The goal of computing isn’t incomprehensible formulas either.” Initial Goal: The usual initial goal of computer algebra is the most concise exact closed-form explicit results that are practical. Theoretical limitations, implementation limitations, and exhaustion of resources can preclude explicit exact closed-form results. Moreover, such results are often not concise enough to be of much use. If such a result is unobtainable, then we can still seek a useful implicit exact result, such as one involving the Mapletm RootOf(equation, variable )function [12], the similar Mathematica(R) Root function [14], or a multivariate generalization of them to a system of equations and/or inequalities, where these constraints are simpler than the original problem. This might reveal some useful properties of an explicit result, and perhaps we can use the implicit result in further symbolic computations, with automatic simplification modulo the implicit system of equations and inequalities. For example, the computer algebra system can automatically reduce exponents in result polynomials modulo a constraint polynomial or modulo a Gr¨obner basis, and we can use implicit differentiation to determine derivatives of an implicit result. However, we usually want explicit results ultimately, even if they must be approximate. Therefore: Alternative 1: If we can’t obtain a satisfactory explicit exact closed-form result, often the next best alternative is an infinite generalized series or product result, such as xn/2 ln (1 + x)n ∑ n! + en . n=1 ∞

(Series are more useful if subsequent operations distribute over sums, whereas products are more useful if subsequent operations distribute over products.)

76

David R. Stoutemyer

Alternative 2: If a useful infinite series or product is also unobtainable, often the next best alternative is a truncated generalized series or product result, such as ³ ´ x1/2 ln(1 + x) x3/2 ln (1 + x)2 2 3/2 + + o x ln (1 + x) . 1+e 2 + e2 Alternative 3: If a useful truncated series or product is also unobtainable, often the next best alternative is an approximate numeric result that is guaranteed to any default or requested accuracy. This could be a number, an array of such numbers, or an expression containing indeterminates together with approximate numbers obtained by interpolation or some other technique. Sometimes these alternatives are useful even when a prior alternative is successful. For example, an infinite or truncated series can most succinctly reveal important qualitative information about asymptotic behavior. As another example, for a relatively narrow interval, quadrature can avoid the catastrophic cancellation associated with approximating the difference in an anti-derivative at the two endpoints. However, it currently requires a sophisticated, experienced and patient user to proceed through whichever of these successive alternatives are provided in a computer algebra system. Often they are buried in addon packages that aren’t fully and seamlessly integrated into the system, and often they use differing input conventions. Wouldn’t it be nice if there were a mode wherein the system would automatically proceed through these alternatives until successful? For example, the system could effectively say: “I am unable to obtain an exact result or an infinite series, but here are the first few terms of a series result, together with an approximate numeric result guaranteed to 6 significant digits.” The system could then offer to compute more terms and more significant digits or automatically proceed to compute them while the user views the initial results.

3 Artfully Combine Exact and Approximate Arithmetic Most computer algebra systems operate by default in exact mode: If they cannot determine an entirely exact result, they return the input, perhaps partially simplified. Most computer algebra systems also have an optional approximate mode or an approximate function such as the Maple [12] evalf(. . . ) or the similar Mathematica [14] N[. . . ] function that use approximate arithmetic throughout, together with quadrature instead of symbolic integration, iterative rather than exact equation solving, etc. For example, in a polynomial result, all of the numbers are forced to floating point, perhaps excepting the exponents. However, perhaps some but not all of the coefficients, matrix elements, or equation solutions in a result could have been computed exactly. Moreover, when a component can’t be entirely computed exactly, perhaps the early portions of its computation could have been done exactly, improving accuracy and perhaps also speed. For example in an iterated integral, perhaps some of the innermost integrations could be done exactly. The two extreme alternatives of exact and approximate modes don’t permit the user effortlessly to see in one unified result, all of the computable exact parts together with all of the necessarily approximate values. Also, hammering an expression with an approximation mode or function can preclude the beneficial effects of doing at least some of the earlier steps exactly. At the other extreme, insisting on a result

77

Vol 41, No. 3, September 2007

Formally Reviewed Communication

in which every computation for every component is exact can preclude seeing any exact components by virtue of exhausting memory or patience. For these reasons, the TI computer algebra calculators have an automatic arithmetic mode. This mode computes in exact mode as far as can, then computes approximately what it can’t compute exactly. The form of the numbers in a result indicates which numbers are exact rational numbers and which are approximate floating-point numbers. a) If a function or an operator has all constant operands and any of them are floating point, then the result is floating point. Otherwise the result is an exact symbolic constant. For example, ln(2) + ln(4) → 3 · ln(2), whereas π + 0.0 → 3.14159. As another example, (4x2 + 5.0x) + (x2 /3 + x/3) → 13x2 /3 + 5.33333x. (Floating-point significands have 14 digits, but the default is to display only 6.) b) For some operations, if memory is exhausted while attempting to compute a component exactly, then an attempt is made to compute the component approximately. c) If some algorithms can’t obtain an exact result for some components but can obtain an approximate one, they do so . For example, solve(x · ex − π · ex + x · arctan(x) = π · arctan(x), x) → (x − π) · (ex + arctan(x)) = 0 → x = π or x = −0.606555.

As another example: Z 1Z y 0

0

¶ √ √ Z 1 µ y e +2 y+1 h · ( x · ex + 1) √ √ ln dxdy → h · dy x · (ex + 2 x + 1) 2 0 → 0.67637670822542 · h,

with quadrature used only for the outer integral. It was originally expected that most users would want exact mode most often. However, automatic mode proved so popular that it was made the default mode. For an occasional intentional approximate result, the user can either wrap their input expression in an approx(. . . ) function or press [Ctrl][Enter] rather than [Enter], which effectively does the same thing. For an intentional sequence of approximate results, it is more convenient to switch to approximate mode. Exact mode is rarely used by most users. There could even be more options between the extremes of exact mode and approximate mode. For example, between approximate mode and the above automatic mode, it would be helpful to have a mode that also automatically approximated any irrational numeric constant sub-expressions that √ are more com3 wouldn’t autoplicated than some threshold, such as being nested irrationalities. For example, 2 + √ 1/3 matically be approximated, but (2 + 3) would. Another alternative is to approximate any expression √ containing more than one irrationality, which is slightly more aggressive. For example, 21/3 + 3 would be approximated. Carette [3] discusses measures that could perhaps also be applied to the default threshold for choosing between exact and approximate display.

78

David R. Stoutemyer

4

Adaptive Precision Intervals & Self-validated Algorithms

“It makes me nervous to fly on airplanes, since I know they are designed using floating point arithmetic.” — Alston Householder (a famous architect of floating-point algorithms and error analysis) For the above quadrature giving 0.6763670822542·h, I set the result display mode to show all 14 significand digits; but how many of these are correct? I am a trained numerical analyst and the implementer of the underlying adaptive quadrature algorithm. However, even I don’t know how many of these digits are correct. Considering the continuity and spectrum of the integrand and its derivatives in the neighborhood of the integration interval, together with my experience with this implementation on such problems, I guess the result has about 6 to 12 significant digits. However, most users don’t have the experience to make any guess, and many would naively believe all 14 computed digits. As a default, it is better not to display digits that aren’t correct to within a few units in the last place: It is distracting, making the result less concise and therefore harder to comprehend. Worse yet, it can dangerously mislead many readers. In contrast, there are interval-arithmetic software packages that guarantee the exact results are in their interval results by propagating the results of rounding or otherwise bounding both up and down. Not only are there interval versions of all the usual arithmetic operations and irrational functions, but there are also special interval algorithms for many operations such as quadrature, zero-finding and optimization. Interval arithmetic was invented to allow floating-point arithmetic to produce guaranteed results. However, for computer algebra it is also quite useful to permit either or both endpoints of an interval to be arbitrary-precision rational numbers. Most interval arithmetic packages use only closed intervals. However, allowing each endpoint to be either open or closed can give significantly tighter results. For example, b[999/1000, 1)c → 0, whereas b[999/1000, 1]c → [0, 1]. Intervals also permit tighter designations for various different results often mapped to a single symbol in many systems. For example, although IEEE 754 arithmetic [8] has a representation for +∞, it doesn’t have a separate representation for numbers strictly between its largest finite representable number Fmax and +∞. Although overflow signals an exception, the default response to an exception is to proceed without a trap, and the default treatment for overflow is to round the result to the representation for +∞ or −∞, whichever is closer. As a result, (2.0 · Fmax )/Fmax → +∞/Fmax → +∞ rather than 2.0 or to the interval (Fmax , +∞)/Fmax → (1, +∞). There is no IEEE representation for this interval, so it would be more correct to degrade it to an IEEE NAN, which stands for “Not A Number”. NAN is √ most conservatively viewed as a complex interval consisting of the entire infinite complex plane, because −1 and 0/0 both produce NAN even though the latter is arguably the entire real line, perhaps depending on context. It would be nice if there were official separate representations for the entire infinite real line and the entire infinite complex plane. IEEE arithmetic doesn’t have separate representations for proper subsets of that plane other than −∞, ∞, and real numbers from −Fmax through Fmax . As a result, sign(|0/0|+∞) → NAN. However, if we regard 0/0 as some unknown number in the interval [−∞, ∞] or the interval [−∞−∞·i, ∞+∞·i] and consequently simplify |0/0| to a representable interval [0, ∞], then we can more usefully simplify sign(|0/0| + ∞) to 1.0. Any implementer of interval arithmetic using IEEE arithmetic or its standard would be wise to avoid the built-in handling of overflow and generation of infinities or NAN. Many software implementations of 79

Vol 41, No. 3, September 2007

Formally Reviewed Communication

adjustable-precision floating-point have no limitation on magnitudes other than the total available memory. However, if they do have a limitation, it is better to represent a finite overflowed positive number by an open interval (Fmax , ∞) than to round it up to ∞. Even with such care, for any particular significand length, classic result intervals might be wider than several standard deviations for expected rounding errors. Moreover, result intervals can be wider than is possible for the worst combinations of rounding errors. The reason is that correlations between multiple occurrences of variables or sub-expressions can preclude the worst combination of interval endpoints. For example, if x has the interval [2, 3], then classic interval arithmetic computes x − x → [2 − 3, 3 − 2] → [−1, 1] rather than 0, not exploiting the fact that both terms are monotonic increasing and 100% correlated. As another example, the rounding errors in Gaussian elimination with pivoting are almost always highly correlated in a way that nearly cancel, but applying classic interval arithmetic to Gaussian elimination doesn’t exploit that, therefore giving very pessimistic bounds that grow exponentially with the number of rows. For this reason, early fixed-precision interval arithmetic was often dismissed as giving uselessly loose intervals for results computed with more than a modest number of operations and/or built-in function calls. However, various improvements such as directed and affine interval arithmetic can give significantly tighter bounds. See, for example, [9]: http://www.cs.utep.edu/interval-comp/ In conjunction with this, there are specially-designed verification or self-validating algorithms for solving equations, quadrature, optimization, etc. that are far more effective than simply applying interval arithmetic to traditional non-interval algorithms. Some of these algorithms compute a result using ordinary floating point arithmetic, but determine an posteriori bound using perhaps interval arithmetic. See, for example, Rump [17], [18], and INTLAB [19]. Mathematica [14] uses some self-validating algorithms. Even with such improvements, a result interval can be wider than desired — perhaps even if it corresponds tightly to the worst possible combinations of rounding errors. A few interval packages support adjustable precision arithmetic so that the user can repeat the calculation with increasing precision until the result intervals are as narrow as desired. However, what the user really wants is to request a result interval width directly. We can request a relative interval width for result intervals that exclude 0. However, we must be content with an absolute interval width for result intervals that include 0, and we can’t avoid such intervals if the exact result is 0. To achieve a requested result interval width, an implementation could automatically iteratively double the significand length until the result bounds satisfy a pre-specified relative or absolute width — whichever is achieved first. Kreinovich and Rump [20] show that this requires at most 5.25 times as long as a single computation at the precision that would just suffice for our desired result interval width. However, there are two inefficiencies in this process: a) The correct bits computed in previous iterations are not recycled to reduce computation time in subsequent iterations. b) In many computations, some of the operands don’t need to be computed with as much significand length as the desired number of significant digits in the result, and other operands must be computed with substantially more significand length than the desired number of significant digits in the result. For example, in a sum of a term t and a much smaller magnitude term s, it is wasted effort to compute 80

David R. Stoutemyer

as many significant digits of s as of t, because those extra digits won’t participate when the decimal points are aligned. As an extreme example, a sub-expression might not need any correct digits. For example, consider a goal of 13 significant digits for the expression (e1 E−13 − 1.0) · (1 E15 + arctan(e)). e1 E−13 must be computed to about 26 significant digits to allow for the catastrophic cancellation when subtracting 1.0, whereas 0 significant digits is adequate for arctan(e) because | arctan(e)| < π/2 < 0.5 E−13 · 1 E15 = 50.0. Both of these inefficiencies are elegantly addressed by continued fraction arithmetic, as described by Gosper [6] [http://www.tweedledum.com/rwg/cfup.htm] and implemented in LeLisp with additional innovations by Vuillemin [23]. These algorithms are on-line co-routines: Each operation remembers what it has computed so far and how to continue, and each operation requests additional precision from its operands on an as-needed basis. Moreover, the algorithms include exact rational arithmetic as a special case, unifying the treatment of exact rational and approximate numbers. This also avoids the radixdependent inaccuracies of floating-point arithmetic. For example: 1/10 can’t be represented exactly if the floating-point radix is a power of 2; and 1/3, 1/7, 1/11, ... can’t be represented exactly if the floating-point radix is a power of 10. The second above inefficiency could be addressed even for typical adjustable-precision floating-point arithmetic as follows: a) Initially compute the result interval using a uniform initial significand length, remembering also all intermediate intervals together with the significand length used for each operation or function evaluation. If the final interval is tight enough, then we are done. b) The last performed operation determines which of its operands need more significand length and estimates what those lengths should be from the result, its current operand intervals, and the significand length used for the operation. This operation also estimates what significand length is needed for its computation. Operands that need more significand length do the same for their operands, and so on until we reach input numbers. We then re-compute operands and/or operations that need more significand length, replacing the remembered intermediate results and significand lengths accordingly. c) If the final interval isn’t tight enough go back to step b). Computations that entail a very lengthy sequence of operations might have to treat subsequences as units done all at the same precision to avoid consuming too much memory for storing intermediate information. The initial significand length should ordinarily be at least as large as the size below which there is no further significant gain in speed. This would probably be about 16-digits if the implementation exploits the most common IEEE floating-point hardware for that precision and lower. If the adjustable precision is implemented entirely in software or if the storage requirements are enormous, such as with large matrix problems, it might be worth using as few as about 7 digits. Anything below that entails greatly increased risk of a useless first iteration. On the other hand, if the desired number of significant digits is no more 81

Vol 41, No. 3, September 2007

Formally Reviewed Communication

than about 30 digits, it might be most efficient overall to start with slightly more than that in hope of completion on the initial iteration. The default result accuracy can be quite modest. Any engineer or scientist trained in the era of 10inch slide rules can attest that 3 correct significant result digits are sufficient for many purposes. Moreover, anyone viewing lengthy formulas containing approximate numbers can attest that initially displaying more than a very few significant digits makes the formula less intelligible. However, the key ideas here are that the default displayed digits should correct to within a few units in the last place and that users can request any positive relative and absolute interval widths that they desire. Project:

Devise and implement efficiently computed estimates for step b).

√ It is also possible to allow interval endpoints to be symbolic constants such as π + 2. However, we would also have to use adaptive interval arithmetic on such constants to compare them with each other when combining intervals. Also, we might want to replace such symbolic expressions with appropriately directed floating-point bounds when the expressions exceed a certain complexity. Note that even an adaptive-precision interval-arithmetic package wouldn’t meet the above computer algebra goals if it is stand-alone, because it would approximate all of the components rather than use the previously-described automatic mode. Moreover, such packages are totally numeric so that, for example, they can’t produce polynomials having interval coefficients. Also, the need to switch contexts and reenter the whole problem into a separate software package then perhaps send those results back to the computer algebra system is a deadly deterrent to its use after a failed computer algebra attempt at an exact result. Only the most determined and experienced users would go to that trouble — and only for very important problems. Thus, the adaptive-precision guaranteed-accuracy approximate arithmetic must be built-into the computer algebra system. IEEE single precision has a significand of about 6 significant digits and a minimum non-zero magnitude of about 10−38 . Results of this precision and range are more than adequate for most purposes. If the default requested adaptive relative and absolute interval widths were set accordingly, adaptive affine interval arithmetic together with self-validating algorithms would usually be no more than a very few times slower than IEEE double precision non-interval floating-point arithmetic that comes with no result guarantees. Even with requested relative and absolute interval widths that correspond to IEEE double precision arithmetic, the computing times should be quite acceptable for most applications. The astounding increase in computer speed and memory size since floating-point arithmetic was first implemented makes it affordable to use interval arithmetic and self-validating algorithms for almost all approximate scientific computation. It should be the default approximate arithmetic — especially in computer algebra where the emphasis is on results that are as exact as is practical. For tasks where an acceptably efficient interval or self validating algorithm hasn’t yet been devised, there are alternatives better than bare approximate arithmetic: a) For adjustable precision approximate arithmetic the system can automatically compare results computed at successively higher precisions until the estimated relative or absolute error is significantly less than the default or requested values. b) Maple [12] offers a ScientificErrorAnalysis package that uses differentiation to propagate uncertainty. 82

David R. Stoutemyer

c) For its software floating-point arithmetic, Mathematica [14] automatically uses significance-propagation, wherein you can query the estimated number of significant digits of a result with the Precision[. . . ] function. Moreover, the N[. . . ] function adaptively increases the working precision until the result has the requested estimated precision, resources are exhausted, or a user-defined maximum precision increase is achieved. Wherever numbers in results don’t have guaranteed error bounds, the numbers could have a visible indication of their caveat emptor. For example, they could have a prefix “∼” or a postfix “?”, or a different color. Built-in adaptive interval arithmetic helps exact arithmetic too. For example, if we can determine the √ π 163 sign of c = e − 262537412640768745, then we can simplify c − |c| to 0 if c is non-negative or to 2c if c is negative. It turns out that eπ

√ 163

∈ [262537412640768744.9999999999992, 262537412640768744.9999999999993].

Therefore adaptive interval arithmetic enables us to guarantee that c is negative, even though it is barely so. However, if p √ √ π 1 + arctan( ) 22 + 2 5 + 5 4 239 c= −q p p √ √ √ 4 arctan( 15 ) 16 − 2 29 + 2 55 − 10 29 + 11 + 2 29 and the CAS isn’t powerful enough to simplify that to 0, then this method would take forever to determine sign(c). It has been proven that here will always be constant expressions equivalent to 0 that neither computers nor humans can guaranteeably simplify to 0 in a finite amount of time. Therefore after a predetermined amount of effort an implementation of interval sign determination would have to do something such as ask the user whether or not to continue trying or else issue a warning and proceed without that simplification. Either way, the implementation could interruptably continue working on the problem while the user views the partially-simplified result. Adaptive interval arithmetic and self-validation also enables us to obtain correct high-resolution plots of functions, equations and inequalities without misleading artifacts such as connected discontinuities, truncated extrema, and aliasing. See, for example Avitzur et al. [1], Fateman [4], Martin et al. [15], Shou et al. [21], and Tupper [22].

5

Display Intervals Intelligibly, Concisely and Flexibly

To display the endpoints of an interval having non-zero width, we should frugally display the minimum number of digits of both endpoints for which the two numbers differ when rounded outward at the last digit. Displaying more digits than that is more of a distraction than a benefit. However, even such a frugal interval endpoint display is cluttered enough to discourage the use of interval arithmetic. For example, we might see [57837979.74135, 57837979.74138] · x9 + [3.2799999, 3.2800000] · x8 + . . . or adaptively using scientific notation where it is more intelligible: 83

Vol 41, No. 3, September 2007

Formally Reviewed Communication

[5.783797974135 E7, 5.783797974138 E7] · x9 + [3.2799999, 3.2800000] · x8 + . . . For intelligibility, most users would prefer to see this example displayed as a midpoint rounded to a place corresponding to the interval width, ± a one significant digit rounded-up maximum absolute deviation such as (5.783797974136 E7 ± 2 E−5) · x9 + (3.28 ± 1 E−7) · x8 + . . . Another possibility is to display the differing digits as a subscript and superscript: 8 5.7837979741385 E7 · x9 + 3.2800000 799999 · x + ...

However, many users would prefer even more to see this example displayed as simply 5.78379797414 E7 · x9 + 3.2800000 · x8 + . . . Here each interval is displayed as a midpoint k, rounded to a place that corresponds to the interval width, with an implicit maximum absolute deviation from k of no more than 1 unit in the last place. INTLAB [19] does this but also appends underscores to remind the user that the number is an interval with a radius of 1 unit in the last displayed digit. The number of underscores is the number of additional digits that would otherwise appear for that number and display format, thus preserving column alignment for tabular output. Within a computer algebra formula it is more concise to use exactly one underscore. Thus adapted to computer algebra, our result could appear as: 5.78379797414 E7 · x9 + 3.2800000 · x8 + . . . Another idea is to follow the interval calculation with a non-interval nominal calculation done at the same precision as the interval calculation, and report that result with trailing digits grayed according to its maximum distance from either interval endpoint. Intervals can be quite asymmetrically conservative, so the nominal calculation seems likely to be more accurate than the interval midpoint. Moreover, we can use algorithms that usually quickly produce high accuracy results but aren’t appropriate for interval versions. We could even use the above-mentioned error-guessing technique to decide how many grayed digits to display. An optimist would appreciate the extra digits even though they aren’t guaranteed. Hyv¨onen [7] discuses additional alternatives. Whatever notation is default, it would be helpful if whenever you move the cursor over such an approximate number in a result, you can have a scrollable pop-up box displaying alternate presentations, and optionally replace the displayed notation with one of these alternatives.

6

Display Exact Numbers Intelligibly and Flexibly

Sometimes we immediately apply a function such as the Maple evalf(. . . ) function or Mathematica N[. . . ] function to an exact result merely because an exact numeric numerator and/or denominator therein is so lengthy that we can’t view and comprehend the entire result at once or can’t easily judge the numeric magnitude. Unfortunately, this hammers all of the numbers in the expression into floating-point, including even the simple ones that are more intelligible in exact notation, such as perhaps even simple integer or fractional exponents. 84

David R. Stoutemyer

Imagine instead that whenever you move the cursor over an exact number in a result, you can have a scrollable pop-up box showing several useful alternate presentations of the number. For example, if the number is the ratio of two integers, these alternate representations could include: a) Display the number as an integer plus a proper fraction (making the approximate magnitude more obvious if it is at least 1). b) Display the number as a decimal fraction, indicated exactly by underlining the repeating part of the fraction if there is one. If the period of the repetition is excessive, the portion after the first few digits of the period could be represented by an underlined ellipsis, susceptible to replacement with all of the elided digits by a click. (We could also allow underlined trailing fractional digits to denote repetition within input.) c) Display the number in factored form. Composite factors that don’t split almost immediately could temporarily be displayed underlined while interruptable computation to split them proceeds. d) Display the number in scientific notation, using an easily-adjusted initially-modest number of digits in the significand. The significand could be terminated by an ellipsis to indicate that the exact value is used internally. e) Display the number as a continued fraction, which can exhibit informative patterns. There would be an option to replace the ratio in the display with any of the alternatives. However, the exact ratio could be preserved and used if the expression is employed in further results. Optionally, an interval associated with the scientific notation could be used instead in further results. There could be an option to have the entire original result re-displayed (or better yet initially displayed) with each number therein displayed in whatever form is most concise. Many of these options could be provided too for highlighted exact irrational numeric sub-expressions. For example, e has the remarkable continued fraction: e = 2 + //1, 2, 1, 1, 4, 1, 1, 8, ..., 1, 1, 2n , ...// =

1 1 + 2+ 1 1

1+...

As another example, Albert Rich [private communication] suggests that for the default display 10 · in which factors having the same proper fractional exponent are collected together and their bases multiplied out also has the useful prime decomposition alternative 25/3 · 32/5 · 54/3 and a useful perfect root alternative 23328000000000000000000001/15 . He calls such irrational numbers that can be represented as pq with p and q rational absurd numbers in keeping with the tradition started with whimsical nomenclature such as imaginary numbers, radicals, irrational numbers, surreal numbers, and surds. There are even more useful alternatives when p isn’t integer, including choices about distributing exponents over ratios and about rationalizing denominators or numerators. Another idea for the optional pop-up menu is to reveal if a result is a Binomial coefficient, a Fibonacci number, the sum of two cubes, etc. Such information might provide a key insight. Towards this goal, Mathematica has a PowersRepresentations[. . . ] function that can determine representations of an integer as a sum of powers. For example, it reveals that 1729 = 93 + 103 = 13 + 123 .

32/5 · 201/3 ,

85

Vol 41, No. 3, September 2007

Formally Reviewed Communication

7 Infinitesimally-perturbed Numbers Univariate optimization problems have long been used as the most common type of example to motivate differential calculus, because examples such as maximizing profit and minimizing cost or environmental damage are of obvious utility. Consider how we might specify the inputs and results of computer algebra functions for solving such optimization problems using the techniques taught in calculus: The techniques taught in calculus entail comparison of objective values and/or one-sided limits at critical points and at endpoints of the interval of interest. These endpoints are ∞ and −∞ if the interval of interest is the entire real line. The critical points are where the objective function changes between real and non-real, places where it not continuous with a continuous first derivative, and places where the first derivative is 0. The techniques can also entail determining the sign of the first non-zero higher-order derivative at the zeros of the first derivative. First consider a monotonically strictly increasing function f (x) over the entire real line, such as arctan(x). It has a supremum, meaning a least upper bound, at x = ∞, but it is traditionally said to have no maximum. In the context of optimization and exact results, it would be incorrect and misleading to return any finite real approximation such as x = 1099 . Therefore a function named maximizers should indicate that there is no maximum for this problem, which doesn’t satisfy our optimization needs. Consequently, we need extended numbers that include ∞ and −∞; and we should name our function supremumizers. (I welcome suggestions for a less awkward name. leastUpperBoundizers is even more awkward, and maximizers already has a widely accepted narrower definition.) Thus consider a function supremumizers(u, x) that, given an objective expression u and a control variable x, attempts to return the location of all of real values of x where u has the same global supremum. If we want to restrict the domain of interest, we can use the syntax supremumizers(u, x) | B,

where B is a Boolean expression such as (3 < x ≤ 7 ∧ x 6= 5 ∨ x > 9) ∧ c > 0. As illustrated by this constraint example, for generality: a) The domain of interest can be disjoint. b) We permit either open or closed endpoints on constraint intervals. c) In the objective expression and constraint we permit additional variables that are parametric indeterminates, because we might nonetheless be able to express the locations of the supremumizers, perhaps in terms of some such parameters. For example, the global supremum of c · (x − a)2 + k | c < 0 occurs at x = a. There might be more than one value that produces the same global supremum, and the supremumizers might include possibly open intervals of non-zero width over which the objective function is constant. Therefore simply returning a finite list of x values or even a single interval isn’t general enough. Therefore let’s have supremumizers(. . . ) return a Boolean expression typically containing equations and/or inequalities that specify the supremizing x values in the most explicit and simplest possible way. Example returned values are: 1 x= , 2 86

David R. Stoutemyer

√ x = π + 2, x = a ∨ (c > 0 ∧ 0 ≤ x < 1), false, true. “false” indicates that no values of the control variable suprememize the objective expression, such as for an objective expression that is non-real throughout the domain of interest. “true” indicates that all real values of x throughout the domain of interest produce the same value for the objective expression, such as for a constant objective expression. A Boolean return specification also facilitates returning an exact implicit result when we can’t determine an explicit result. For example a result might entail an algebraically irreducible polynomial that contains a parameter, making even iterative approximate methods inapplicable, such as for the real globally optimizing solutions of x5 + c · x + k = 0. The real solutions of such an equation might not all correspond to global supremumizers, so in such cases, we can claim only that the implicit result merely includes promising supremumizer candidates. However, even the above return specification for supremumizers(. . . ) isn’t general enough for our exact optimization purposes. For example, consider the objective expression 1/(x-2) on the real line. This expression has an supremum of ∞ at x = 2. However, if we return x = 2, then a user should obtain ±∞ rather than ∞ when substituting x = 2 into 1/(x − 2). This is particularly problematic if this resulting objective value is automatically used in further calculations. We could return an approximate value such as 2.000000001, but the goal here is exact results, and substituting such an approximation into the objective function doesn’t produce the optimum objective value of ∞ that is exactly expressible as such with the infinity-extended arithmetic that is offered by most computer algebra systems. Difficulties can arise even with continuous objective expressions if the global supremum occurs at an open endpoint of the domain of interest. For example, with supremumizers(4 − x, x) | x > 2,

4 − x strictly monotonically approaches 2 from below as x approaches 2 from above, but a returned result x = 2 violates the input constraint, and a returned result such as x = 2.000000001 isn’t exact. Thus for both examples we need to indicate that the optimal x is infinitesimally to the right of x = 2 and that the user should instead compute, for example, limx→2+ (1/(x − 2)) to obtain the corresponding objective value. The standard notation for one-sided limits suggests an elegant solution to this dilemma: We can return x = 2+ for these two examples. For greater faithfulness to the limit notation we could instead return x → 2+ . However, there is no need here to thus complicate the computer algebra system and squander the precious symbol “→” that is useful for other purposes. The “+” superscript is sufficient to indicate that we mean “x = 2, infinitesimally perturbed to the right”. With modern non-standard analysis, hyper-real numbers and surreal numbers, infinitesimals are coming back into fashion. Therefore, without embarrassment we can call numbers such as 2+ and 2− infinitesimally perturbed. The “+” and “−” superscripts can be regarded as postfix operators. One way to internally represent a number such as 2+ is similar to however we represent any other function of 2, but with an appropriately unique name such as “perturbedRight(2)”. This is a simple generalization of an idea that is already present in IEEE floating point arithmetic [8] only for the number 0.0. 87

Vol 41, No. 3, September 2007

Formally Reviewed Communication

A user of a result such as x = 2+ might be another function or program that invokes supremumizers(. . . ), then, for example, automatically determines the corresponding objective value. If so, it is a cruel, inefficient and unreliable use of human labor to require the programmer of every such procedure to awkwardly test for a “+” or “−” postfix operator then branch accordingly to a substitution or an appropriate one-sided limit. Thus to help automate use of such perturbed numbers, including determination of the corresponding objective value, we should make “substitutions” such as 1/(x − 2) | x = 2+ automatically compute the limit to return ∞. For an unperturbed number, substitution would continue to behave in the customary manner. For example, 1/(x − 2) | x = 2 → ±∞. It is also important to implement arithmetic for multi-valued constants such as ±∞ rather than degrade them to undefined, because a subsequent calculation such as arctan(| ± ∞|) → π/2 can enable us to reenter the more manageable world of finite single-valued expressions. The next section shows how multi-intervals can represent multi-valued constants and operate properly on them with no extra required programming. Rather than a composite expression, the objective expression is often a function invocation f (x), where f is the name of either a built-in or user-defined function. Thus we should also make the function invocation f (2+ ) automatically compute the limit to return ∞. In contrast, f (2) should automatically be evaluated using classic substitution. For example, with signum(x) := if x = 0, then 0 else |x|/x, signum(0− ) = -1, signum(0) = 0, and signum(0+ ) = 1, whereas signum(1− ) = signum(1) = signum(1+ ) = 1. Or, perhaps there are only two distinct values among f (n− ), f (n) and f (n+ ), such as for bxc at any integer n. This notation and treatment for infinitesimally perturbed numbers permits evaluation of the notation f (. . .) to more thoroughly exhibit the behavior of a function in a unified way, and similarly for the substitution notation u | x = . . .. Generalizing numbers to include infinitesimally-perturbed numbers unifies substitutions and function evaluation with limits, and these perturbation notations are more concise than the traditional limit notation. (The advantage of concise notations is enormous. For example, it has been said that vector notation enabled Maxwell’s equations, and tensor notation together with Einstein’s tensor summation convention enabled his general relativity theory. As another example, complex numbers have enabled too many mathematics, engineering and science advances to list here.) More generally either by using an appropriate part selection function or by cutting and pasting or by direct entry, a user might want to use 2+ anywhere that it is reasonable to do so. For example, a user might want to simplify directly 1/(2+ – 2) to ∞. For this to work for any expression that we might want to optimize, every applicable built-in function and operator should produce a corresponding perhapsperturbed result when given one or more numeric perturbed inputs. Moreover, to allow for computations done in several stages, these perturbations should be preserved in all intermediate, final and assigned results. For example, substituting the supremumizer x = 2+ of supremumizers(4−x, x)|x > 2 into 4−x should yield 2− rather than 2.

88

David R. Stoutemyer

However, even though most operations won’t generate perturbed results unless given perturbed inputs, it might be politic to provide the option of suppressing display of spontaneous perturbations for students at the more elementary mathematics levels. Infinitesimally perturbed numbers similarly help the solve(. . . ) function return correct results for a broader class of equations. For example, if solve(1/(x − 2) = ∞, x) → x = 2, then the substitution 1/(x − 2) | x = 2 → ±∞, which doesn’t satisfy the given equation. In contrast, if solve(1/(x − 2) = ∞, x) → x = 2+ , then the generalized substitution 1/(x − 2) | x = 2+ → ∞, which satisfies the given equation. As already mentioned, the IEEE floating-point standard [8] mandates a separate representation for 0.0− and 0.0+ , with appropriate rules, such as 1.0/0.0− → −∞; and good libraries for floating-point elementary functions further exploit this by rules such as sin(0.0− ) → 0.0− . Unfortunately, the IEEE standard also uses 0.0+ for the unperturbed 0.0, which most usefully means the 0.0 strictly between 0.0− and 0.0+ . Therefore, incorrectly −1.0/(5.0 − 5.0) → −1.0/0.0+ → −∞ rather than ±∞. There were of course tight constraints on the IEEE number representation and a strong desire for fast, easily-implemented floating-point hardware. These considerations are of little importance for computer algebra, where we are implementing our own arithmetic in software anyway: We might as well have the full generality of optional + and − postfix operators allowable after any number, including floating-point numbers. For example, it is extra information to know the tighter result x < b rather than x ≤ b even if b is a floating-point number. Compared to the cost of everything else in adjustable-precision floating-point, there would be negligible run-time cost even in the relatively rare instances where these operators do occur in expressions. Infinitesimally perturbed numbers are also helpful for implementing a limit function. For example, such numbers make mere extended evaluation suffice for computing ln(x − 9) + x99 → −∞/0+ + (9+ )99 x→9+ (x − 9) → −∞. lim

If the only supported perturbed numbers are 0+ and 0− , then a limit implementation must either have special rules for singularities at finite non-zero limit points or must shift such limit-points to the origin to capture and retain perturbation information. Such shifts can significantly complicate the limitand expression by forcing expansion and/or common denominators involving high-degree polynomials and computation of their greatest common divisors. Worse yet, computer algebra systems that support no perturbations must use either special rules or a reciprocal transformation to shift finite limit points to ∞ or −∞ for this purpose, which can even more greatly complicate the limitand. 89

Vol 41, No. 3, September 2007

Formally Reviewed Communication

With perturbation postfix operators there is no need to use a parenthesis to indicate an open interval endpoint versus a square bracket to indicate a closed interval endpoint. Therefore we can use brackets for both types. For example, [0+ , 1] indicates an interval that is open on the left and closed on the right, denoted (0, 1] in the visually disturbing traditional notation that also significantly complicates parsing. Furthermore, the interval [b− , b+ ], meaning an element of the set {b− , b, b+ }, can’t be expressed gracefully in the traditional notation: (b), (b], and [b) denote an element of the empty set, whereas [b] is the one element of the set {b}. We could arguably represent this infinitesimal interval as “)b(”, but that is even more visually disturbing and problematic for parsing. If users are provided a mechanism for entering the perturbation operators, then we must decide a few syntactic issues: Note that although these perturbation operators are displayed as superscripts for increased clarity in this article, “+” and “−” can be used for these postfix operators even with one-dimensional input or display, provided parentheses are used if necessary to avoid misinterpretation as the usual infix or prefix arithmetic operators. For example, with the convention that “+” and “−” are interpreted as postfix operators only if no other syntactically correct interpretation is possible. (2−) − −3 means (2− ) − (−3); and with implicit multiplication, (2−)3 means 2− · 3. If we want to enable one-dimensional input to perturb negative numbers without having to parenthesize them, we could let these postfix operators bind just slightly less tightly than prefix “−” to their operands. For example, “−2+” would parse as (−2)+ rather than as −(2+ ), which is equivalent to (−2)− . Here are a few examples of obvious simplification rules that should be implemented for the built-in operators and functions, where n represents a positive integer whereas a and b represent unperturbed finite rational constants: −(b+ ) a+ + b a+ + b+ a+ − a+ a+ + b− 0 · (b+ ) a · b− 1/0− 1/0 1/b+ (b− )n bb− c

→ → → → → → → → → → → →

(−b)− (a + b)+ (a + b)+ , if a 6= −b [0− , 0+ ] [(a + b)− , (a + b)+ ] 0 (a · b)+ , if a < 0 −∞ ±∞ (1/b)− , if b 6= 0 (bn )− , if 1 = mod(n, 2) bbc − 1, if b ∈ Z

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Note that by rule (5), a+ + b− simplifies to an interval of infinitesimal width, [(a + b)− , (a + b)+ ]. An optimist might assume that the oppositely-signed perturbations cancel giving a + b, but we shouldn’t assume that the two perturbations are 100% correlated except in special circumstances. Users might also want to apply perturbation operators to an expression that isn’t simply an unperturbed finite rational constant. If the expression is a rational combination of rational numbers, some of which might be perturbed, then bottom-up simplification according to rules such as the above yields a single rational number, perturbed rational number, infinite magnitude, or interval, to which we can apply rules such as: 90

David R. Stoutemyer

(−∞)+ (b− )− (b− )+ [b− , b+ ]+

→ → → →

−∞ b− [b− , b+ ] [b− , b+ ]

(13) (14) (15) (16)

Some rules √ such as (1) through (16) are also applicable when a and/or b are exact irrational constants such as π or 2. For example, we could use the transformation π+ + 3 → (π + 3)+ . As illustrated with this example, this rule can make the expression slightly bulkier. Therefore it is natural to consider using the transformations in the opposite direction. However, for most operators with more than one operand, there is more than one way to distribute the perturbation. For example, we could do (π + 3)+ → π+ + 3, (π + 3)+ → π + 3+ , or (π + 3)+ → π+ + 3+ . This non-uniqueness makes it more canonical to transform perturbations toward the top level of an expression rather than down toward the lower levels. Also, computation of limits is easier if perturbations are propagated toward the top level. We can also devise rules for some irrational functions. For example where f is a continuous strictly monotonically increasing function of its one argument, we can use the rule: f (b+ ) → ( f (b))+

(17)

For example, arctan(u+ ) → (arctan(u))+ for all real u. Some rules such as (1) through (17) apply even if a and/or b are expressions containing indeterminates. However, for example: a) Rule (6), 0 · (b+ ) → 0, is inapplicable if an indeterminate in b precludes determination that it has a finite magnitude. b) Rule (7), a · b− → (a · b)+ if a < 0, is inapplicable if an indeterminate in a prevents determination of its sign. Non-real numbers in rectangular form can have their real-part and/or imaginary part optionally perturbed either way. Non-real numbers in polar form can have their radius and/or angle optionally perturbed either way.

8 Multi-interval Arithmetic If a < 0 ≤ b, then 1/[a+ , b] = [−∞, (1/a)− ] ∪ [1/b, ∞]. With only contiguous intervals, we must degrade this to [−∞, ∞]. Such degradations can lead to unnecessarily wide result intervals. For example, 1/[−∞, ∞] → [−∞, ∞], whereas 1/(1/[a+, b]) ≡ [a+, b]. More generally, it is possible for the result of a function or operator to be any number of disjoint intervals even if all of the inputs are contiguous intervals. For example, consider the function f defined by f (x) = bxc + x/2, which is a slippery staircase with each tread having slope 1/2 separated by risers of height 1. Regarding intervals as designating sets, f ([0, 3− ]) → [0, (1/2)− ] ∪ [3/2, 2− ] ∪ [3, (7/2)− ]. 91

Vol 41, No. 3, September 2007

Formally Reviewed Communication

With only contiguous intervals, we must degrade this to [0, (7/2)− ]. As noted by Jeffrey and Norman [11] it is more problematic to compute with a set of values than with an arbitrary element thereof. Moreover, the latter interpretation entails greater argument-type consistency between, for example, f (3) and f ([5, 7]). Therefore it seems preferable for most purposes to regard an interval as an unknown element of a set. Degradations such as the above example are unnecessary if we generalize our intervals to multiintervals, also known as interval sets. A multi-interval is defined as a lexically-ordered tuple of elements, helement1, element2, . . .i, with each element being a number or an interval. The numbers and interval endpoints can be any mixture of exact and approximate real and non-real numbers, perturbed or not, together with infinities. One possible lexical ordering for rectangular elements is according to non-decreasing order of the real parts of their lower left endpoints, with ties broken according to non-decreasing order of the imaginary parts, with further ties broken similarly according to the upper right endpoints. Another possible representation for a non-real multi-interval is as a real multi-interval plus i times another real multi-interval. However, this might force some degradation. for example, h[0, 1 + i], [2 + 2i, 3 + 3i]i with area 2 degrades to h[0, 1], [2, 3]i + h[0, 1], [2, 3]i · i, with area 4. A tuple of annular segments is similarly more frugal than a real multi-interval times an exponential of i times a real multi-interval. A simplified multi-interval is one that can’t be represented with fewer elements by combining included, overlapping or adjacent elements. A significant additional advantage of multi-intervals is that they also provide a general representation for multi-valued constants such as ±∞ → h−∞, ∞i and zeros(x4 = 1, x) → h−1, −i, i, 1i, together with automatic correct arithmetic for all of them. For input and display we can still use more compact notations such as “±” where applicable. Because of varying-precision numbers and symbolic expressions having many different operators and function names at the top level, computer algebra data is already of varying size, with types typically identified by explicit tags. Therefore, there is no need to represent every ground-domain element in its most general form as a multi-interval. More specifically, we can represent rational numbers, irrational numbers, complex numbers, perturbed numbers, and intervals as such, and an expression can contain a mixture thereof. It would be unnecessarily space and time inefficient to do otherwise, because increasingly complicated types are increasingly rare. In fact, whenever possible, we should simplify a multi-interval to an interval, a complex number to a real number, and an interval to a number or a perturbed number. An equation such as x = h[−∞, 2− ], π, [4, 5− ], [5+ , 6− ]i can alternatively be represented as a Boolean combination of comparisons such as x < 2 ∨ x = π ∨ 4 ≤ x < 6 ∧ x 6= 5. However, having x “factored out” makes the interval representation easier for many purposes, and it is quite useful to be able to represent such a set of numbers or an arbitrary element thereof independent of any problem-specific or dummy variable. It is also possible to allow multi-intervals to contain indeterminates. However, combining such intervals is likely to yield unwieldy results. Therefore, a Boolean combination of equations and inequalities is likely to be a less awkward representation in such cases. Moreover, a Boolean representation more gracefully represents implicit specification of the possible values of a variable or a set of coupled variables, such as in x5 + x · y + 2 · y5 = 0 ∧ ex·y > ln(x + y + c). Multi-intervals obey obvious rules of arithmetic: For example, the sum of two multi-intervals is the simplified multi-interval representing all possible sums of one element from the first multi-interval and one element from the other multi-interval. This can be done by looping or recurring through the elements 92

David R. Stoutemyer

of one multi-interval inside looping or recurring over the elements of the other multi-interval, merging successive new elements into a cumulative result, merging elements when possible, and exploiting the fact that the inputs are ordered. Depending on mergers, if the inputs have m and n elements, the result can have anywhere from 1 through m · n elements. For example: h[0, 1], 3+ i + h[5, 6], [8, 9]i → h[5, 7], [8, 10], [11+ , 12+ ]i Mathematica [14] supports real multi-intervals in which all of the elements are intervals having unperturbed closed endpoints. The Maple [12] evalr function simplifies expressions containing such multiintervals, and there are some additional Maple interval arithmetic capabilites available from the Web MapleSofttm Application Center [13]. Although multi-intervals can contain both numbers and intervals, it is worth trying rather hard to avoid in results approximate numbers that aren’t the endpoints of intervals. For example, consider the multi-interval h[1.2 . . . , 12.3 . . .], 1.24 . . .i. What value are the guaranteed bounds in the first element if the second element 1.24 has no error bounds and could represent a value that is actually less than the lower bound of the first element?

9

Generalized Limits

Computing limits also benefits from the above general multi-intervals. Consider the following teachings in most calculus texts: a) We are taught recursive rules for limits such as the limit of a sum is the sum of the limits, together with l’Hˆopital’s rule and truncated series for indeterminacies such as 0 · ∞, and various transformations to convert other indeterminate forms such as ∞ − ∞ into this form. b) We are taught that a limit is defined only if it is a unique single value. For example, the following are regarded as undefined : lim sin(x),

x→∞

lim x−1 .

x→0

If we strictly obey these rules when implementing a computer algebra limit function, then we would compute respectively: a) limx→∞ (sin(x) · e−x ) → undefined · 0 → undefined, rather than 0. 2 −1 2 −1 2 b) limx→0 e−(x ) → . . . → e−(limx→0 x ) → e−undefined → e−undefined → eundefined → undefined, rather than 0.

For such examples, without clearly stating that they are operating outside their own official rules, the textbook authors effectively compute generalized limits respectively: a) Use the interval [−1, 1] for glimx→∞ sin(x), then compute [−1, 1] · 0 → 0. 2

b) Use the multi-interval h−∞, ∞i ≡ ±∞ for glimx→0 x−1 , then compute e−h−∞,∞i → e−∞ → 0. 93

Vol 41, No. 3, September 2007

Formally Reviewed Communication

A textbook derivation might have an informal explanation beginning with the observation that sin(x) is bounded, and an informal argument for example b) too. This is common practice for a textbook, but such varied informal non-constructive derivations are awkward to automate in computer algebra, compared to formalizing and implementing a generalized limit that can return a multi-interval. Surely more students would become proficient at computing limits if educators openly stated that during intermediate calculations they are using a more general definition of limits that permit intervals and multi-valued results such as ±∞. In fact, such generalized limits are more useful for final results too. For example, it is more useful to know that limx→0+ sin(1/x) remains bounded in the interval [1, 1] than to receive a degraded result such as undefinedReal, meaning [−∞, ∞], or undefined, meaning [−∞ − ∞ · i, ∞ + ∞ · i]. The Mathematica Limit[. . . ] function can return multi-intervals. The traditional definition of a limit doesn’t return infinitesimally perturbed numbers, multi-valued numbers, intervals or multi-intervals. Therefore to avoid fluttering the sensibilities of calculus teachers and students, it might be politic to offer not only a generalized limit function but also a traditional limit function that uses the generalized limit function for all internal calculations then degrades the result using rules such as degradeForLim(b+ ) degradeForLim([b− , b+ ]) degradeForLim([b, b+ ]) degradeForLim([b− , b]) degradeForLim([a, b]) degradeForLim(h. . .i)

→ → → → → →

b b b b undefined undefined

(18) (19) (20) (21) (22) (23)

Another way that the definition of a limit in most calculus texts is unfortunately restrictive, is that if the limit point is approached along the real axis, the limit is typically regarded as undefined if it is non-real or even if the limitand is infinitesimally non-real when perturbed in√the direction from which √ the limit point − is approached. For example, most calculus texts regard lim x, hence also lim x→0 x, as undefined, x→0 √ because x is non-real for x < 0. However, this result is awkward to implement without glim, because the typical ways to implement limits don’t entail epsilons and deltas that stray√into the non-real domain √ √ for this example. However, glimx→0− x → 0+ · i. Moreover, since glimx→0+ x → 0+ , glimx→0 x → [0 + 0+ · i, 0+ + 0+ · i], which is a complex interval of infinitesimal width and height, represented here in rectangular form by its lower left and upper right corners. √ √ To obtain the traditional result for either limx→0− x or limx→0 x we can have the additional rule degradeForLim(non − real) → undefined. As another example of the usefulness of generalized limits: 1 → 1/(−∞ + π · i) x→0− ln(x) → 0− + 0+ · i,

glim

and 94

(24)

David R. Stoutemyer

glim

x→0+

1 → 1/ − ∞ ln(x) → 0− ,

so 1 → [0− + 0 · i, 0− + 0+ · i], x→0 ln(x)

glim

which is useful information that degradeForLim(...) could degrade to undefined. I suspect that if glim(..) is also made conspicuously available to users and encouraged, then at least the more advanced users will come to prefer it to lim. Either way, it is quite helpful for implementing a simpler and more powerful lim function.

10 Optional Automatic Recovery of Exact Numbers Suppose you measure a number c experimentally or approximate it computationally because your computer algebra system can’t compute it exactly. This could be because an appropriate existing algorithm isn’t known or implemented in your system, or it could be because you exhaust memory or your patience trying to compute it exactly. a) If c = [5.1666665, 5.6666667], you would be justified in suspecting that the exact result is 5 + 1/6. b) If c = [5.141592, 5.141593], you would be justified in suspecting that the exact result is 2 + π. √ c) If c = [5.414213, 5.414214], you would be justified in suspecting that the exact result is 4 + 2. In such cases, you might be inspired to measure or compute the result to greater accuracy to either refute your suspicion or gain further confidence. In the latter case you might then seek and find a proof — perhaps becoming famous! Unfortunately, most of us recognize the leading decimal digits of only a few simple fractions and irrational numbers. For example, most of us wouldn’t recognize that 0.94117647058802 is within 3 E−13 of√16/17 or that 4.188790204786024 is within 4 E−13 of 4π/3 or that 0.39038820320 is within 2 E−12 of ( 17 − 1)/8. Fortunately, there are algorithms that can help identify such promising exact representations of approximate numbers. Given a real number, the quotients from Euclid’s algorithm can be used to recover a sequence of increasingly accurate rational approximations whose errors alternate in sign. This can be used to recover likely simple exact rational results from approximate ones. More generally, given a set of real constants, there are various algorithms such as Ferguson and Bailey’s PLSQ integer relation algorithm [5] that, given a tuple of m+1 real numbers r = hr0 , r1 , . . . , rm i, determines a minimum-norm tuple of integers hn0 , n1 , . . . , nm i such that the inner product of the two tuples is 0 or approximately so, or determines that no such relation is computable at the precision being used. This can be used to recover exact irrational constants in many ways, such the following two:

95

Vol 41, No. 3, September 2007

Formally Reviewed Communication

a) If you want to check if a computed or measured real approximate result c is a rational linear combination of one or more given real constants such as 1 together with π and e, take r0 = c, r1 = a good approximation to the first given constant, r2 = a good approximation to the second given constant, etc. If the algorithm returns hn0 , n2 , . . . , nm i, then the rational linear combination is c ≈ −(n1 · r1 + . . . + nm · rm )/n0 . Moreover, you can do this separately for the real and imaginary parts of a computed or measured complex approximate result. b) If you want to check if a computed real approximate result is very nearly a zero of an mth degree polynomial having rational coefficients, then take r0 = 1, r1 = c, r2 = c2 , etc. If the algorithm returns hn0 , n1 , . . . , nm i, then the polynomial is nm · xm + . . . + n1 · x + n0 . If c is very nearly a real zero of an irreducible polynomial of degree < m, then one or more of the leading coefficients will be 0. Let k be the left-most non-zero coefficient. Then for 1 ≤ k ≤ 4 you could use the linear through quartic formulas to exactly express c explicitly, and you could express c explicitly for some higher degree polynomials that are binomials or that de-nest, for example. (You might also have to use interval arithmetic to determine which zero of the polynomial corresponds to c.) Otherwise you could at least express the exact c implicitly using a function such as the Maple RootOf(. . . ) function or Mathematica Root[. . . ] function. However, many users might rather not see explicit recovered results that entail the general cubic or quartic formula. Tolerances can be used to limit the allowable magnitudes of the resulting integer coefficients and of the difference between the resulting formula and c. Also, at the expense of more computing time, sensitivity to rounding error could probably be improved by first trying for a simple rational multiple, then a multiple of π, alone, etc., then iteratively trying increasing m = 2, 3, . . .. Exercise: Given an approximate number c and real constants r1 through r4 , How could you use PLSQ to seek minimum-norm integers n1 through n4 such that c ≈ (n1 · r1 + n2 · r2 )/(n3 · r3 + n4 · r4 )? Exercise: Explain how PLSQ can be used to determine integers a0 , b0 , a1 , b1 , . . . , am , bm , so that an approximate number c is approximately a real zero of the polynomial (am + bm π) · xm + . . . + (a0 + b0 π). Exercise: Given an approximate number c and real constants r1 through r3 , How could you use PLSQ to seek minimum-norm integers n1 through n3 such that c ≈ exp(n1 · r1 ) · exp(n2 · r2 ) · exp(n3 · r3 )? Maple [12] has an Identify(. . . ) function that checks for a variety of forms, together with a PLSQ(. . . ) function in its IntegerRelations package. Mathematica [14] has a built-in integer-relation function named LatticeReduce[. . . ] and a specialized variant named RootApproximant[. . . ] for the polynomial case. Imagine that instead of having to know about and invoke such functions, whenever you move the cursor over an approximate number in a result, you can have a scrollable pop-up box that might show one or more rational numbers and/or simple irrational numeric expressions that your result might be approximating, together with the corresponding relative and absolute differences between the approximate and candidate exact values. By automatically running PLSQ once for a quadratic polynomial and once with given constants 1, π and e, you could recover many exact results from approximate ones. Just look in the Answer’s section of typical algebra through calculus texts: A large √ percentage of the exact numeric answers are rational numbers, quadratic numbers of the form a + b · d, or affine rational combinations of π and/or e. The pop-up box could even have an “Advanced” button that allows the user to adjust the 96

David R. Stoutemyer

maximum degree of the polynomial, the set of real constants, and even the set of expression forms to try, such as suggested by the above exercises. Better yet, as a background job while you view the result, the computer algebra system could automatically subject all approximate numbers in a displayed result to a search for close exact constants, conspicuously altering the appearance of any approximate constants for which good exact constants are found. For example, the color could change or the number could change to resemble a button labeled with the approximate value, subtly inviting a click to view the candidate(s). With personal computers, the computer may as well be searching for such hidden gems while it awaiting the next keystroke or mouse activity. Imagine the delight of doing a calculation approximately because of insufficient memory or computer algebra power to do it exactly, then being proffered a promising potential exact result as well! Here too there could be an option to replace the approximate number in the display with any of the alternative exact candidates. The entries in the pop-up box could be ordered by increasing complexity of the exact candidates, with the complexity threshold and the error thresholds set tight enough to usually preclude ridiculous candidates. The error thresholds could depend on the width of the intervals. As we allow increasing length rational coefficients, an increasing degree polynomial, or an increasing number of independent irrational constants, we can get increasingly close to an approximate result. Therefore complicated results are less plausible than simple ones that also have acceptably small differences from the given approximate result. As a rule of thumb, the number of digits in the resulting expression totaled over all of its rational numbers should be limited to significantly less than the number of significant digits in the given approximate number, and the integer relation algorithm should be done using at least several times the number of significant digits in the given approximate number. There are, of course many other irrational constants that could be tried besides π and e. For example, Robinson and Potter [16] contains a table of 2498 approximate constants c, ordered by increasing magnitude of their fractional parts. This makes it easy to identify results of the form ±(n + c), where n is an integer having the same sign as c. Binary search would allow this identification to be done in about log2 (2498) ≈ 11 comparisons. These constants have all appeared in published literature, so they are meaningful for at least one problem. Examples include the twin-prime constant and lemniscate constant; the constants of Artin, Catalan, Euler, Khintcine, Lehmer and Roth; the zeros, extrema and special values of various functions such as Bessel functions. An internet web search on “mathematical constants” or on “physical constants” reveals many other candidates that could be added to the table. Borwein, Jonathan, and Peter Borwein [2] is a more accessible and thorough collection. There is even an Inverse Symbolic Calculator internet site [10] with a calculator that attempts to identify real numbers that you enter: http://oldweb.cecm.sfu.ca/projects/ISC/ Of course PLSQ wouldn’t run well if you attempted to include too many constants in one model, and combinatorics make it prohibitive to try more than one or two together at a time. However, PLSQ obsolesces the need for many of the constants in [16], some of which are polynomial zeros or rational multiples of each other. Moreover, PLSQ generalizes the multiples and affine combinations that could be recognized. It would be a valuable project to implement a sufficient subset of this table to be at least scanned with a binary search to be used as intended, then perhaps also automatically used one fundamental constant at a time in PLSQ.

97

Vol 41, No. 3, September 2007

Project:

Formally Reviewed Communication

Use PLSQ to see if some of the published constants are simply related to each other.

11 Summary There are many ways that the treatment of numbers can be improved in computer algebra systems so that: • Exact results can be obtained more often. • Approximate results are guaranteed to a requested accuracy. • Results are more intelligible. These ways include: • an automatic mode that combines the best features of exact and approximate computation; • flexible display of exact and approximate numbers in the most concise and intelligible forms, including pop-up menus that offer alternative representations for individual numbers; • infinitesimally-perturbed numbers; • exact and adaptive multi-intervals that admit both open and closed endpoints; • self-validating algorithms; • generalized limits; • optional recovery of exact constants from approximate ones. Some systems have some of these features to some extent. It would be an altogether more satisfying and fruitful experience if your system contained all or most of them, automated as defaults so that even novices immediately enjoy their benefits.

Acknowledgments I thank you Robert Corless, David Jeffrey, Sam Rhoads, Albert Rich, Siegfried Rump and the referee for their helpful suggestions.

References [1] R. Avitzur, O. Bachmann and N. Kajler. From honest to intelligent plotting, Proceedings of ISSAC 1995, ACM, pp. 32-41. [2] J. Borwein and P. Borwein. A Dictionary of Real Numbers. Wadsworth & Brooks Cole, 1990. [3] J. Carette. Understanding Expression Simplification. Proceedings of ISSAC 2004, ACM, pp. 72-79. [4] R. Fateman. Honest plotting, global extrema, and interval arithmetic. Proceedings of ISSAC 1992, ACM, pp. 216-223. 98

David R. Stoutemyer

[5] H. R. P. Ferguson and D. H. Bailey. A polynomial time, numerically stable integer relation algorithm. SRC Technical Report SRC-91-xxx, December, 1991. [6] R. W. Gosper, Continued fraction arithmetic, http://www.tweedledum.com/rwg/cfup.htm [7] E. Hyv¨onen. Interval input and output, Scientific Computing, Validated Numerics, Interval Methods, editors W. Kr¨amer and J.W. von Gudenberg, Kluwer Academic/Plenum Publishers, 2001, pp. 41-52. [8] IEEE standard for binary floating-point arithmetic. ANSI/IEEE Std 754-1985. (See also IEEE 8541987 for radix independent floating point.) [9] Interval arithmetic: http://www.cs.utep.edu/interval-comp/ [10] Inverse Symbolic Calculator: http://oldweb.cecm.sfu.ca/projects/ISC/ [11] D. J. Jeffrey and A. C. Norman. Not seeing the roots for the branches. ACM SIGSAM Bulletin, Vol 38(3), issue 149, pp. 57–66, 2004. [12] Maple 11.0.1 on-line help. [13] MapleSoft Application Center. http://www.maplesoft.com/applications/?p=maple11 [14] Mathematica 6.0.1.0 on-line help. [15] R. Martin, H. Shou, I. Voiculescu, A. Bowyer and G. Wang. Comparison of interval methods for plotting algebraic curves, Computer Aided Geometric Design, 19, pp. 553-587, 2002. [16] H. P. Robinson and E. Potter. Mathematical Constants, UCRL-20418, Lawrence Radiation Laboratory, University of California, Berkeley CA 94720, 1971. [17] S.M. Rump. Computer-assisted proofs and self-validating methods. In B. Einarsson, editor, Handbook on Accuracy and Reliability in Scientific Computation, pp. 195-240. SIAM, 2005. [18] S.M. Rump. Self-validating methods. Linear Algebra and its Applications, 324:3-13, 2001. [19] S.M. Rump. INTLAB, the MATLAB interval toolbox,, down-loadable from http://www.ti3.tu-harburg.de/rump/ [20] V. Kreinovich and S.M. Rump. Towards optimal use of multi-precision arithmetic: A remark. Reliable Computing, 12, pp. 365-369, 2006. [21] H. Shou, R. Martin, G. Wang, Bowyer and I. Voiculescu. A recursive taylor method for algebraic curves and surfaces, Proc. Comp. Methods for Algebraic Spline Surfaces, pp. 135-154, 2003. [22] J. Tupper. Reliable two-dimensional graphing methods for mathematical formulae with two free variables. ACM SIGGRAPH 2001, pp. 77-86: http://portal.acm.org/results.cfm?coll=portal&dl=ACM&CFID=16415594&CFTOKEN=82953951

[23] J. E. Vuillemin. Exact real computer arithmetic with continued fractions. IEEE Trans. Computers 39(8): pp. 1087-1105, 1990.

99