Apr 12, 1973 - This paper describes an application of testing methodology and techniques developed by the authors to aid in improving the quality of ...
JOUR NA L OF RESEARCH of the Na ti ona l Bureau of Stand a rd s- B. M a thematical Sciences Vo l. 77B, Nos. 3 & 4, July- Dece mbe r 1973
Performa nce Testing of a FORTRAN Library of Mathematical Function Routines - A Case Study in the Application of Testing Techniques D. W. Lozier, l. C. Maximon, and W. l. Sadowski Institute for Basic Standards, National Bureau of Standards, Washington, D.C. 20234 (April 12, 1973)
Thi s pape r desc ribes a n app li cation of tes tin g meth odology a nd te c hniqu e s de ve lope d by th e a uth ors to a id in impro vi ng th e qu a lity of mathem ati ca l soft wa re. Thes e techni q ues diffe r rad ica ll y in se vera l as pects from tec hni q ues pre vious ly used . th e most import a nt d iffere nce be in g th at th e testin g is not bas ed e xc lu s ivel y Oil random a rgum e nt s. In s te ad , throu g hou t th e ran ge of e ac h fun c ti o n use is ma de of s pec ia l a rgum e nt s th at a re des ig ne d to d etec t progra mmin g e rrors and to tes t th e pe rfnrm a nce of an algo rithm in diffe re nt region s. T he fu ncti on valu es a re tes te d a ga in s t refe re nc e valu es whi c h are s tore d o n re fe re nce tap es ge ne ra ted b y a hi ghl y au th e nti ca te d s yste m of s ubro utin es. S ince th e eff ec tive ness of s uch a tes tin g sys te m in di scove rin g e rrors and pe rform a nce limit ati ons c an be fu ll y asce rt a in ed o nl y th ro ug h ac tu a l use , we report th e res ult s of e m ployin g our s ys tem to tes t an exi stin g FORTRAN library of ma th e mati c a l fun c tion routin es. S pec if, c a s pect s of th e num e ri ca l a cc urac y of th e libJ·a ry used in thi s te st ca se are di sc ussed in ord e r to ill :~5 tra t e th e e ffec ti ve ness of a we ll -d es igned tes tin g s yste m a s a n ana lyti c tool fo r the e valuation of math e mati ca l so ft wa re. S in ce docum e nt ati on prov id es informa tion ne cessa ry to perform testin g and co nt ain s s pecifi cation s th at re fl ect th e res ult s of testin g, our stud y in c lud es com me nts o n th e doc um e nt at ion. No informati o n o n timin g or stora ge re quire me nt s is prese nt ed in thi s case s tud y. Ke y word s : Aut o m ated tes tin g; bit comparison; F O RTR AN librar y; fun ct io n va lid a ti o n; math e mati ca l fun cti on s ; pe rfo rm a nce tes ts .
1.
Introduction
Thi s paper prese nt s th e res ults of a case s tudy of a FORTRAN libra ry of ma th e mati cal fun ction routin es whi c h serves to demonstrate th e e ffective ness of a testin g progra m de veloped by the authors [1] '. To ac hieve thi s purpose we in clud e in th e paper a general e valuatio n of th e library and its doc um e ntation as well as data on its numeri cal accuracy. Pro grammin g errors di scove red by the application of our techniqu es are pointed out and suggestions for improve me nt are made. The particular library which served as our test case is the UNIVAC ll08 FORTRAN library of mathematical fun cti ons. This libra ry was released by UNIVAC for th e EXEC 8 o pe ra tin g syste m in 1971; the RL 24 version reported on in thi s pape r is a release that co ntain s modifi ca ti ons e nabling it to run under EXEC II. This case study should not be co ns tru ed as a validation or certifi cation of th e man ufacturer's software and the authors would like to s tress th e fac t that no e ndorse me nt of th e manu· facturer's library is impli ed.
AMS Subject Classification: 68A I O. I Fi gure s in brac ket s indi cat e th e li te rature refe rences a t th e end of th is paper.
101
2.
Testing Techniques
Below is a short discussion of some of the salient features of our testing approach. 1. The library function values were tested by comparing them with referen ce values. The latter were obtained through the use of a software system containing an arbitrary precision arith· metic package and a set of arbitrary precision algorithms developed by us at the National Bureau of Standards [2]. This eliminates the necessity of recourse to the software on a computer with higher precision to supply the reference values, and permits the testing of software on a computer with any wordlength. To minimize the effect of the local computer environment on the performance of our software testing system, we store the reference function values on tape. The comparison between library function values and the reference values is performed by our Bit Comparison Program designed for highly automated testing [1]. 2. The performance of the testing system has been authenticated by a series of rigorous tests that included checks with published tables [1]. 3. To avoid conversion errors in the arguments, only exact machine (octal) arguments were used to test the mathematical function subroutines. For the same reason, fun ction values are displayed in octal form. In this way all conversion errors are avoided. 4. Since no theory exists at the present time that would make it possible to construct a set of random arguments that are certain to be representative of the total population of allowable arguments, thus providing a statistical basis for testing, exclu sive use of random arguments must currently be considered inadequate for function testing. Special arguments must be used to test certain aspects of performance. It is for this reason that we do not use statistical terms in evaluatin g the performance of the library. We test functions on a logarithmic scale by supplying arguments in each power of two throu ghout the entire argumen t range. Among these are both special and random arguments. Arguments with special bit configurations are designed to probe for weaknesses caused by specific features of th e hardware. In view of the testing on a logarithmic scale, the percentages quoted in our breakdown of errors may differ significantly from those give n by the manufacturer. Whereas the percentage of full bit accuracy in the various argument ranges is of importance to the tester and the mathematician who does research in algorithms, the user very frequently may not know the range in which his program will generate arguments as a part of a larger calculation. For such a user the most important index of the accuracy of a mathematical function subroutine is its maximum error and this is the index adopted in this study. 5. It is very important to have a clear definition of the error measure used in describing the results of any given performance test. If the definition given in the manufacturer's literature (UP-7876, pp. 2-3, 2-4) were applied to the results of their tests it would show the library in a worse li ght than it deserves. In fact, interpreting their performance results according to our defini· tion of error measure gives a good agreement with our test results. Therefore , comparison betwee n the manufacturer's test data and ours is made on the basis of our definition of the measure of error. We call our measure of error the "mantissa error." It is the difference resulting from a fixed point subtraction of the mantissa of the reference function value from the mantissa of the library fun ction value. Prior to this subtraction the library function value is normalized to the characteristic of the reference value. The difference is expressed in units of the last bit position (ULP) [3] , [4]. For example, an error of 3 ULP implies that the last two binary digits of the library value are in error but not the last three. The mantissa error is equal to the relative error to within a factor depending on the normalization of mantissas in the computer used (a factor of 2 in this case, where mantissas are normalized to lie between 1/2 and 1).
3.
Comments on Manufacturer's Documentation
The documentation will be commented upon from two points of view, namely from that of the user and that of the tester [5]. The documentation contained in the FORTRAN V Library manual provides information deemed necess ary for the user. It is well written and contai ns a wealth of infor· mation on each routine, including the test data. However, in a few instances the documentation con·
102
tain s in acc uracies. F or example, the st ate me nt r 2+ 52 < 2 128 o n p. 2-51 (sec. on C LO G, s ubsec tio n A rgum e nt and Fun ction Ran ge) s hould read 2- 12!J ~ f2 + 52 < 2 127. On p. 2-52 th e state me nt th a t e rror te rmin a ti on res ults if r 2 + s2 > 2 128 or U < 2 - 128 is inco rrect. It sho uld read r 2+s2;:;':2 127 or f2 + S2 = O. No e rror termin ation results for th e real part of th e fun c ti on less tha n 2- 128. Th e task of tes tin g would be facilita ted by t he th e ide ntifi cati o n of math e mati cal me th ods a nd co ns ta nts with a ppropri at e refere nces , s uc h as H a rt et a i. , Co mpute r A pproxim ati ons [8]. Furth e rmore, in listin g the fun ctions refe re nced b y a give n libra ry routin e, a li stin g of librar y routin es whic h call th e give n fun ction would aid in th e c ross-refe re ncin g a nd th e tracin g of th e effects that e rrors in a giv e n routin e ha ve on the perfo rm a nce of oth er ro utin es. Modifi cation s of th e so urce code s hould be refl ec ted in use r doc ume nta ti on more promptly. For exa mple , the S e pte mber 1972 update of th e use r ma nual (UP -7876) does not co ntain me nti on of the c rossover point for s mall argum e nts incorpora ted into the so urce code DSI NC O at leas t since January 1971. Th e source code a nd th e com me nts contain ed therein con stitute doc ume ntation for the tester. Th ey are co ncerned with th e co mputer imple me ntation of the math e matical algorithms. Two practi ces , if a dopted uniforml y throughout the library, wo uld save tim e and effort in testing and e valua tion: The clear id e ntifi cati on of aU co nstants in octal- those used to compute fun ction values as well as those used for logical decisions, exa mples be in g c rossove r points and points at whi c h e rror tracin g begin s - a nd the ide ntifi cati o n at th e a ppropriate point in th e co din g of th e quantiti es bein g tes ted again st th ese la tter co nsta nts.
4.
General Evaluation of the Library
Th e prese nt library shows th a t a tte nti on has bee n paid to th e c hoice of good al gorithm s base d o n num erical a nalys is. Th e co mpute r impl e me nta ti on of th ese algo rithm s s ho ws atte nti on to th e loss of s ignifi ca nce, parti c ul a rl y in th e process of argum e nt redu c tion.2 Exte nd ed prec is io n codin g has ge nera ll y bee n s uppli ed to avo id loss of acc uracy, whi c h requires s pecial progra mmin g to circ umve nt th e limit ati ons of hardw a re (l ack of guard d igit s, finite word le ngth , fl oatin g point norm ali zatio n , e tc.) a nd of in s tru c ti on re pe rtoire (l ack of doubl e precis io n fi xed point in s tru c ti o n fo r multiply a nd divid e, etc.). It is a ppropri a te at thi s po int to stress th e importance of goo d testin g tec hniqu es in de velopin g a ma th e ma tical library of hi gh qu ality. Those s ubroutin es th at have bee n tested with th e tec hniqu es de velo ped a t th e J e t Propulsion Laboratory [6], na mely the sin gle a nd doubl e precision s ubroutin es [7], exhibit pain st akin g atte nti on to de tail a nd ha ve fe wer programmin g e rro rs th an th e co mple x fun c ti on ro utin es. T o mini mize the effect of ha rdw are limitations, som e co nstants a re " fin e-tun ed " b y sli ghtl y modifyin g the ir bit re prese ntati on. F or exa m ple, in the DSINH routine, the oc tal mac hin e co nsta nt 200140 .. _00, re prese ntin g unity on lines 93 a nd 94 of th e so urce code, was chan ge d to 200140 ... 01 to decre as e th e numbe r of two bit e rrors. Thi s techniqu e co uld be used to a still greate r exte nt , howe ver , and we occasionally ma ke reco mm e nd ati ons to that effect in the bod y of thi s re port. The effect of th ese reco mm e nd a ti ons has bee n explicitl y e valuated only for th e ro utin es DSI NH and DSINCO_ The specificati ons for mos t ro utines in th e libra ry a re ve ry good. The ge neral philo so phy of acce pting all argum e nts for whi ch th e fun ction value ca n be held in th e machin e is adh ered to for th e mos t part. Howeve r , the s pecifi cati ons for ce rtain co mplex fun c tion s (C LO G, CCBRT) are undul y restri c tive in th a t th e limit of th e argume nt ran ge is based on a n ove rflo w co ndition fo r f2 + S2 rath er tha n (f2 + 52 ) 1/ 2 . W e found th e pe rform a nce of th e s ubroutin es to be ge ne rally in co nformity with th e s pecifi cati ons, alth oug h, whe re th e maximum error is co ncerned , th e s ubroutin es do not alwa ys co me up to s pecifi cati ons. On e parti c ular example of thi s concerns the single precisi on fun c tions, whe re the s pecifi cations give a maximum e r ror of one ULP. Thi s is indeed verified , with th e s in gle ex2 This is in mar ked co nt ra s t 10 ils im me di at e p redecesso r (Marc h 23, 1966 u pdate of th e mo del 1107 FO RTR AN Li bra r y S ubrou t ines rnan u al - UP- 3947). whi c h lik e se ve ra l lib ra ri es of d iffe re nt origin c ont ai n cert ain wea k nesses du e 10 inadequa te n umeric al analys is, su c h as th e loss of a ll s ign ifi ca nce in th e ca lcula tiu n of sinh x fo r slTl all x by t h e use uf s in h x = 1/2( e.r - e - .r).
103
I
TABLE
Entry point
No. of arguments
...... o ~
SIN DSIN CSIN real imag COS DCOS CCOS real imag TAN DTAN CTAN real imag COTAN DCOTAN ASIN DASIN ACOS DACOS ATAN DATAN ATAN2(X"X ,)
2888 8140 2874 2888 8140 2877 5693 8174 2881 5678 8159 1839 9037 1839 9037 3870 11353 699
Max error ULP
1.
Performance of library fu.nctions
Percentage of values with ULP error of
Suggested improvement s
0
1
2
3
4
Percent
Percent
Percent
Percent
Percent
1 2
98 34
2 63
3
1 4 4 2
97 76 99 83
3 20
3
€
€
12
5
4 2 1 4
97 78 95 78
1 19 5 12
€
1 1 1 5 1 6 1 4 1 5 1
96 99 96 57 99.7 54 99.5 74 96 68 98
4 1 4 37
x x x €
errors
Comments
4- ULP errors due to COS routine.
x
€ €
€
€
€
4
2
€
Max. error exceeds specs.
X
x x
x
4- ULP errors due to COS routine.
x 8
x
Error traceable to T ANCOT routin e.
x
Fails strict mon otonicity in range 2- 26 to 2- '4. Max. error exceeds specs.
5
1
€
6
1
€
Max. error exceeds specs.
3
€
€
Max. error exceeds specs.
5
€
€
Max. error exceeds specs. The arguments were heavil y weighted towards X dX , very small or very large to test the manage ment fun ction of the entry point. See section on Suggested Improve ments , Errors and Specific Comme nts.
X
€
39 €
23 4 27 2
x
DATAN2(X" X,)
SINH DSINH CSINH real imag COSH DCOSH CCOSH real imag TANH DTANH
Programming
1384 5906 2874 1384 5906 2877 1426 5906
1 3
99.7 8
4 1 1 2
76 97 99.6 93
4 2 1 1
98.6 78 99 99
€
9]
€
€
20
3
€
€
€
€
x x
x
Max. error exceeds specs. 4 -U LP e rrors due to COS routine.
x
x
Max. error exceeds specs. 4 -U LP errors due to COS routin e.
€ €
I
7
€
1 19 1 1
€
3
I
~
o
7'
:;: o TABLE 1.
;;j
Performance of library func tions- Continu ed
,
"'
Entry point
~
o