Statistical functions and procedures in IDL 5.6 and ... - Semantic Scholar

3 downloads 17106 Views 187KB Size Report
1. Introduction. A methodology for assessing the dependability of statistical software has been recently ..... NIST provides datasets with “best-available” solutions,.
Computational Statistics & Data Analysis 50 (2006) 301 – 310 www.elsevier.com/locate/csda

Statistical functions and procedures in IDL 5.6 and 6.0 Oscar H. Bustosa , Alejandro C. Freryb,∗ a Facultad de Matemática, Astronomía y Física, Universidad Nacional de Córdoba, Ciudad Universitaria,

5000 Córdoba, Argentina b Departamento de Tecnologia da Informação, Universidade Federal de Alagoas, Campus A.C. Simoes,

BR 104-Norte, 57072-970 Maceió, AL, Brazil Received 17 April 2003; received in revised form 20 August 2004; accepted 30 August 2004 Available online 25 September 2004

Abstract This work presents the results of assessing the accuracy of statistical routines as implemented in the IDL platform, versions 5.6 and 6.0 for Windows XP and Linux. This is “a complete computing environment for the interactive analysis and visualization of data. IDL integrates a powerful, arrayoriented language with numerous mathematical analysis and graphical display techniques (Research Systems Inc., IDL Versions 5.6 for Microsoft Windows, 2003 and 6.0.1 for Linux x86 m32, 2004. URL http://www.rsinc.com)”. It is shown that, though it is an excellent platform for signal and image processing and analysis, it has flaws when statistical computing is concerned, mainly when dealing with non-linear regression by least squares fitting and, in particular, when computing in single precision floating point. © 2004 Elsevier B.V. All rights reserved. Keywords: Statistical software; Numerical computation

1. Introduction A methodology for assessing the dependability of statistical software has been recently proposed (McCullough, 1998, 2000b; McCullough and Wilson, 1999, 2002). This protocol has been applied to usual first- and second-order univariate statistics, ANOVA, regression ∗ Corresponding author. Tel.: +55 82 2141401; fax: +55 82 2141615.

E-mail address: [email protected] (A.C. Frery). 0167-9473/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2004.08.011

302

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

Table 1 Number of accurate digits for computed mean, standard deviation and first-order autocorrelation coefficient in single and double precision Datasets

lew lottery mavro michelso numacc1 numacc2 numacc3 numacc4 pidigits

Single

Double

x

s

1

x

s

1

∞ 6.2 5.7 5.8 6.6 5.0 6.6 5.5 ∞

6.2 6.0 4.6 4.6 ∞ 5.5 1.2 — 6.4

2.6 2.0 1.7 3.6 0.0 3.3 — — 3.9

∞ 8.0 ∞ ∞ ∞ ∞ ∞ 7.7 ∞

8.1 8.0 8.0 8.6 ∞ ∞ ∞ ∞ 7.8

2.6 2.0 1.7 3.5 0.0 3.3 3.3 3.3 3.9

Table 2 Number of accurate digits in the final F -statistics of ANOVA analysis Datasets

AtmWtAg SiRstv SmLs01t SmLs02 SmLs03t SmLs04t SmLs05t SmLs06t SmLs07 SmLs08 SmLs09

LRE Double

Single

8.3 3.6 14.0 13.0 12.0 8.2 8.0 6.1 2.4 3.4 —

NA NA 7.4 7.2 7.2 NA NA NA NA NA NA

(linear and non-linear), cumulative distribution functions and to pseudorandom number generation. This work assesses the IDL platform, versions 5.6 and 6.0, a popular and widespread platform for signal and image processing with outstanding graphical display capabilities. This assessment employs the “Statistical Reference Datasets”, provided by the (American) National Institute of Standards and Technology (NIST, 2000). The accuracy of cumulative distribution functions employs the ELV program by Knüsel (1989). The datasets provided by NIST (2000) include both generated and “real world” data, either challenging or designed to challenge specific computations. The ELV program (Knüsel, 1989) computes probabilities and quantiles of the following distributions: Standard normal, gamma, 2 , beta, F , Student-t, Poisson, binomial and hypergeometric. The upper and lower quantiles of such distributions are computed with six significant digits for tail probabilities ranging from 10−12 to 0.5. Regarding IDL (Research Systems Inc., 2003/2004) “(it) is a complete computing environment for the interactive analysis and visualization of data. IDL integrates a

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

303

Table 3  and the least accurate standard error s in single and Number of accurate digits of the least accurate coefficient  double precision Datasets

Filip Longley Norris Pontius Wampler1 Wampler2 Wampler3 Wampler4 Wampler5

Single

Double

 

s

 

s

NA 3.8 4.6 2.2 — 0.0 — — —

NA 0.0 5.2 3.0 — 0.0 — 2.0 3.2

0.0 7.5 7.7 8.3 6.3 ∞ 6.3 6.3 6.3

0.0 7.8 7.7 7.7 4.1 8.4 7.3 7.3 7.3

Table 4 , the least accurate standard error s and the convergence Number of accurate digits for the least accurate coefficient  of algorithm Datasets

Start 1  

Bennett5 (H) Boxbod (H) Chwirut1 (L) Chwirut2 (L) Danwood (L) Eckerle4 (H) Enso (A) Gauss1 (L) Gauss2 (L) Gauss3 (A) Hahn1 (A) Kirby2 (A) Lanczos1 (A) Lanczos2 (A) Lanczos3 (L) Mgh09 (H) Mgh10 (H) Mgh17 (A) Misra1a (L) Misra1b (L) Misra1c (A) Misra1d (A) Rat42 (H) Rat43 (H) Roszman1(A) Thurber (H)

Start 2 s

C

 

0.0 0.0 Y 0.0 — — N 8.0 7.4 8.4 Y 7.4 7.5 7.8 Y 7.5 8.4 8.1 Y 8.4 — — Y 7.9 — — Y 5.1 7.3 7.4 Y 7.3 7.3 7.7 Y 7.3 7.3 8.0 Y 7.3 7.4 7.6 Y 7.4 7.7 7.8 Y 7.7 — — Y 0.0 — — Y 0.0 — — Y 0.0 — — Y 6.0 — — N — — — Y 0.0 0.0 — N 8.4 8.1 8.1 Y 8.1 0.0 0.0 Y 0.0 7.8 7.7 Y 7.8 7.8 7.4 Y 7.8 7.9 7.6 N 7.9 — — Y 0.0 7.3 6.7 N 6.7 Y, converges; N, does not; Y, unacceptable situations.

Start 3 s 0.0 7.6 8.4 7.8 8.1 7.9 6.3 7.4 7.7 8.0 7.6 7.8 — — — 6.1 — 0.0 8.0 8.1 0.0 7.7 7.4 7.6 0.0 6.7

C

Y N Y Y Y Y Y Y Y Y Y Y

Y Y Y Y N

Y Y Y

Y Y Y N

Y N

 

s

C

∞ 12.0 ∞ ∞ ∞ 10.0 9.7 ∞ 10.0 10.0 10.0 10.0 10.0 10.0 10.0 11.0 10.0 ∞ 12.0 11.0 0.0 13.0 12.0 11.0 ∞ 10.0

7.1 10.0 10.0 10.0 11.0 10.0 10.0 10.0 10.0 9.5 10.0 10.0 1.4 8.8 8.7 10.0 9.5 9.7 11.0 10.0 0.0 11.0 10.0 10.0 10.0 10.0

Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N Y N

304

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

Table 5 Number of accurate digits for the least accurate coefficient  , the least accurate standard error s and the convergence of algorithm data and starting points in single precision Datasets

Start 1  

Boxbod Chwirut1 Chwirut2 Danwood Enso Hahn1 Kirby2 Mgh09 Misra1a Misra1b Misra1d Rat42 Rat43

— 5.3 5.5 6.4 2.2 5.6 6.1 — 0.0 6.6 6.8 4.6 6.5 Y, converges; N, does not.

Start 2 s

C

 

s

C

— 6.3 5.6 5.3 3.4 2.1 3.2 — — 5.4 4.6 4.2 5.8

N Y Y N Y Y Y N N N N N N

5.1 4.3 3.3 6.6 2.4 4.8 6.0 3.1 6.8 6.4 5.0 7.0 4.4

5.1 4.4 3.8 5.3 3.6 3.1 3.2 3.2 5.0 4.9 5.0 6.1 4.5

N Y N N Y Y N Y N N N N N

powerful, array-oriented language with numerous mathematical analysis and graphical display techniques. [...]You can explore data interactively using IDL commands and then create complete applications by writing IDL programs. Advantages of IDL include [...] many numerical and statistical analysis routines provided for analysis and simulation of data [...]”. There are approximately 25,000 licenses around the world, and twice that number of users. Some of the users of this platform are Porsche, Dornier, Texaco and NASA. IDL integrates naturally to ENVI, a powerful platform for image processing and analysis, with geographic information system (GIS) capabilities. This integration makes IDL a very popular statistical analysis tool for the remote sensing community and, therefore, the flaws IDL presents as a statistical package need being reported since statistics plays a central role in this field. Following (McCullough and Wilson, 2002), in order to compare x, the result obtained by the software under assessment, with the “certified” value c  = 0, the number of correct digits will be computed with the log relative error LRE(x, c) = −log10

|x − c| . |c|

Negative values of LRE arise when completely wrong results are obtained, and will be denoted by—in Tables 1–5. Values 0 < LRE < 1 are reported as 0.0, since less than one significant digit is meaningless. Results like this cast doubt on the numerical quality of a computational platform. Section 2 presents the results obtained using IDL on the statistical reference datasets (NIST, 2000), while Section 3 compares IDL and ELV for the computation of the quantiles of the Standard normal, 2 , F and Student-t distributions. Section 4 briefly discusses the

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

305

random number generation capabilities of IDL. Finally, Section 5, discusses the results and makes general recommendations.

2. Results on first- and second-order statistics 2.1. Univariate summary statistics NIST provides datasets with certified values for the mean, standard deviation, and (lag-1) autocorrelation coefficient to assess the accuracy of univariate summary statistic calculations. Computational inaccuracy has three sources: truncation error, cancellation error and accumulation error. Truncation error is the inexact binary representation error in storing decimal numbers according to the IEEE standard arithmetic. Cancellation error is an error that occurs when analyzing data that has low relative variation, or numbers with “high level of stiffness”, i.e., values with the same most significant digits. Accumulation error is the error that occurs in direct proportion to the total number of arithmetic computations or the number of observations in the univariate case. NIST include both generated and “real world” datasets so as to allow computational accuracy to be examined at different stiffness levels and different accumulation error levels. Real world data are referred to as lew (200 integer observations ranging from −579 to 300), lottery (218 integer observations ranging from 4 to 999), mavro (50 observations with five leading digits ranging from 2.00130 to 2.00270) and michelso (100 measures of the speed of light as obtained by Michelson & Morley, ranging from 299.620 to 300.070). The first generated dataset (numacc1) consists of the three values 10000001, 10000003 and 10000002. Dataset numacc2 consists of 1001 observations: one occurrence of the value 1.2, 500 occurrences of the value 1.1 alternating with 500 occurrences of the value 1.3. Dataset numacc3 is very similar to numacc2: one occurrence of the value 1000000.2, 500 occurrences of the value 1000000.1 alternating with 500 occurrences of the value 1000000.3. Dataset numacc4 is very similar to numacc3: one occurrence of the value 10000000.2, 500 occurrences of the value 10000000.1 alternating with 500 occurrences of the value 10000000.3. Dataset pidigits is the set of the first 5000 digits of the number . IDL provides the MOMENT function that computes the mean, variance, skewness, and kurtosis of a sample population contained in an n-element vector. The mean and standard deviation can also be computed by the MEAN and STDEV functions, respectively, or just calling the latter with the ‘MEAN’ option. The coefficient of correlation is computed by the CORRELATE function, which is a direct implementation of the Pearson’s estimator of correlation in the data domain. The certified values were computed using algorithms provided by Netlib (http://www.netlib.org), but no reference to the precise technique for each dataset is provided. Table 1 presents the number of accurate digits obtained when computing the mean x, the standard deviation s and the first-lag correlation 1 in both single and double precision. In this table it can be seen that MEAN has a fairly good behavior in both single and double precision, that STDEV is excellent in all but two cases (numacc1 and numacc4), and that CORRELATE exhibits a poor behavior whatever the precision employed.

306

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

2.2. Analysis of variance IDL 5.6 and 6.0 provide the procedure ANOVA to perform one-way analysis of variance with equal and unequal sample size, and two-way analysis of variance with and without interactions. In order to assess the accuracy of this function, NIST provides datasets by Simon and Lesage (1989) designed to have the number of constant leading digits to be 1, 7 and 13, and to have the number of observations per cell to be 21, 201 and 2001. NIST also provides “real world” datasets: AtmWtAg with 7 constant leading digits, and SiRstv with 3 constant leading digits. Its documentation claims that “for all datasets, multiple precision calculations (accurate to 500 digits) were made using the preprocessor and FORTRAN subroutine package of Bailey (1995, available from NETLIB). Data were read in exactly as multiple precision numbers and all calculations were made with this very high precision. The results were output in multiple precision, and only then rounded to fifteen significant digits”. Since ANOVA produces many numerical results, only the number of significant digits for the final F -statistic are presented in Table 2 (single and double precision for data available in single precision, and double precision only for data provided in double precision). As expected, there is a reduction in the number of significant digits when single precision is performed but, in this case, the reduction is acceptable. Table 2 allows us to say that IDL behaves acceptably well when performing ANOVA analysis. When using double precision, the worst behavior is in datasets SmLs09 (no significant digits), SmLs07 (approximately two significant digits), and SmLs08 and SiRstrv with approximately three significant digits. It is noteworthy that many good algorithms return zero digits of accuracy for the SmLs09 dataset (McCullough, 2000a), which is considered highly difficult by NIST (this dataset is formed by 9 × 2001 observations of the form 1012 followed by a single decimal). 2.3. Linear regression Even with the availability of reliable code for linear least squares fitting, problems persist with the practice of linear regression. Failure to use the best algorithms and to implement them most effectively is often the cause. NIST provides datasets with certified values for key statistics for testing linear least squares code. Certified values are quoted to 16 significant digits and are accurate up to the last digit due to possible truncation errors. NIST provides 11 datasets, eight of which deal with polynomial regression, one is intended to multiple linear regression and two are tailored to assess the performance of linear regression with null intercept. IDL, unfortunately, does not provide a routine for the last problem. IDL offers the function REGRESS, to perform least squares fit to a multiple linear regression, and POLY_FIT that performs a least-square polynomial fit. The former was applied to the datasets Filip, Longley and Norris, and the latter to the remaining datasets. The Filip dataset is provided in double precision and, therefore, no analysis is performed in single precision (thus the ‘NA’ entries in Table 3). Among the many results these routines provide, Table 3 shows the minimum LRE for the estimated coefficients   and their standard deviations s, when computing in both single

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

307

and double precision. From this table stems a recommendation to IDL users: only employ functions REGRESS and POLY_FIT in double precision calls. The Filip dataset poses a nearly singular problem for the software to solve. A good package is expected to detect situations like this, and to warn the user that proceeding could result in a complete inaccurate answer due to the accumulation of rounding error. The function REGRESS has an optional output variable, STATUS, that will contain the status of the operation: ‘0’ if successful completion, ‘1’ if a singular array was detected (which indicates that the inversion is invalid), and ‘2’ for warning that a small pivot element was used and that significant accuracy was probably lost. Though ‘2’ was the expected answer for the Filip dataset, IDL returned ‘0’ instead leading the user to accept the inaccurate results without warning. Single precision computation was not performed since still worse results are expected.

2.4. Non-linear regression Since non-linear least squares regression problems are intrinsically hard, it is important to both assess the results and the reliability of the interface, i.e., whether the code acknowledges or not the finding of a solution. NIST provides datasets with “best-available” solutions, obtained 128-bit precision and confirmed by at least two different algorithms and software packages using analytic derivatives, available from Netlib. Three sets of initial solutions are provided for each problem, since these routines are particularly sensitive to the starting points: Start 1: a value relatively far from the certified solution; Start 2: a value relatively close from the certified solution; Start 3: the actual certified solution. For testing purposes, it is interesting to see how a code will perform when the starting values are not close to the certified solution, and also how it responds when the certified values are used as starting values. IDL provides the function LMFIT to perform a non-linear least squares fit to a function with an arbitrary number of parameters using the Levenberg–Marquardt algorithm, which combines the steepest descent and inverse Hessian function fitting methods. The function informs, upon user request, if the algorithm has achieved convergence after a maximum number of iterations. The default value is 50. This routine is based upon the MRQMIN procedure, a least squares fit to a non-linear function provided by Press et al. (1992). The convergence criterion is based on the number of iterations and changes on the sum of the squared errors; the user can specify the minimum and maximum number of iterations and the tolerance. The default values are 5, 50 and 10−6 (single precision) and 10−12 (double precision). Table 4 summarizes the results of applying LMFIT to NIST datasets computed in double precision and default settings. Besides the name of the dataset (first column) there is an indication of its level of difficulty: high (H), average (A) and low (L), according to NIST. Selected datasets are commented below. ‘Y’ denotes unacceptable situations where the user receives an affirmative convergence signal, whilst the result has no significant correct digits. Seventeen of these situations were

308

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

found, only for starting points 1 and 2 and double precision computing. These unacceptable situations appear in datasets with high, average and low levels of difficulty. When the starting values are not the certified solutions, the results obtained for the Bennett5, Lanczos1, Lanczos2, Lanczos3, Mgh17 and Roszman1 datasets are bad or unacceptable (negative LRE), though the algorithm issues a message of convergence. The function LMFIT only provides good solutions when starting in the certified point, with a few exceptions: the standard deviation of the estimators (s) for the Lanczos1 and every estimation in the Misra1c datasets. The algorithm only informs convergence when starting in the certified values for the Boxbod dataset, though the solution provided when beginning in Start 2 is very good. When Start 1 is used as initial guess for the Eckerle4, Enso and Mgh09 datasets the algorithm informs convergence, but the solutions are, indeed, very bad. The solutions for Mgh10 are very bad, but the algorithm produces a warning of nonconvergence in this case. When using the Misra1c dataset the results are consistently unacceptable, but no warning is provided. This is a serious flaw. Though the results are fairly good for Rat43 and Thurber, the algorithm always issues warnings of non-convergence. Table 5 presents the results of applying LMFIT in single precision to data provided with single precision starting points, and default routine settings. In this case, only Start 1 and Start 2 are used, since Start 3 (the certified values) are provided in double precision. As expected, when data is processed in single precision the results are poorer than when double precision is employed, with only one exception: Enso, an unexplained situation. In every other situation, the number of significant digits is reduced and there are cases where convergence is not achieved in single precision whereas it is achieved when double precision computing is employed. From this analysis, a recommendation is that, at least, more reliable mechanisms of warning should be implemented in the function LMFIT. The user is warned only to use this routine in double precision, and to try several starting points.

3. Results on inverse distribution functions Inverse cummulative distribution functions are needed in hypothesis testing and pseudorandom number generation, among other applications. IDL offers routines to compute the inverse cummulative distribution function of the 2 , F , gaussian and t-Student laws through the functions CHISQR_CVF, F_CVF, GAUSS_CVF and T_CVF. In order to assess the accuracy of these functions, the ELV program by Knüsel (1989) was used. Table 6 shows the worst results: tail probability, P  10−5 and low degrees of Table 6 Number of accurate digits for computed probabilities (P) in single and double precision Function GAUSS_CVF T_CVF CHISQR_CVF F_CVF

P

Single

Double

2 × 10−7

1.9 0.0 1.6 2.4

6.0 6.2 6.3 6.0

2 × 10−7 2 × 10−7 10−5

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

309

freedom for the 2 (1 degree of freedom), F (1 and 1 degrees of freedom) and t-Student (1 degree of freedom) distributions. From this table one concludes that these functions should only be used in double precision.

4. Random number generation The random number generator provided by IDL is the one by Park and Miller (1988), improved by a Bays–Durham shuffle. It is a linear congruential generator and, thus, suffers from problems of undesired structures in high dimensionality. The documentation neither states typical periods of the sequences nor suggests suitable seeds, and its bad properties are related in Entacher (2000). The platform also offers functions for the generation of binomial, gamma (restricted to integer shape parameter values), uniform integers, gaussian and Poisson deviates and, according to the documentation, they implement algorithms provided by Press et al. (1992).

5. Conclusions The software under assessment does not behave well when computing standard statistical functions in single precision. Regarding basics statistics, function MEAN exhibits the best performance, both in single and double precision, but functions STDEV is unreliable and CORRELATE shows a poor behavior whatever the precision employed. The platform behaves well when performing ANOVA analysis in most cases, failing only with a highly difficult dataset. The functions provided for linear and polynomial regressions fail to provide sensible results when used in single precision; they considerably improve when called with double precision parameters. The same holds for the computation of percentiles. The results obtained in non-linear regression are erratic, depending upon the dataset and the starting point. In order to improve the platform, some utilities should be added as, for instance, computation of regression without intercept and reliable warning messages, in particular for the non-specialist user. Other random number generators should be offered as, for instance, Marsaglia’s and L’Ecuyer’s (documented and implemented in Ox (Doornik, 1998).

Acknowledgements The authors are grateful to CNPq for the partial support to this research.

References Doornik, J.A., 1998. Ox: An Object-Oriented Matrix Programming Language, fourth ed. Timberlake Consultants & Oxford, London. URL http://www.nuff.ox.ac.uk/users/doornik Entacher, K., 2000. A collection of classical pseudorandom number generators with linear structures: advanced version. URL http://www.crypto.mat.sbg.ac.at/results/karl/server/server.html

310

O.H. Bustos, A.C. Frery / Computational Statistics & Data Analysis 50 (2006) 301 – 310

Knüsel, L., 1989. Computation of statistical distributions. URL http://www.stat.uni-muenchen. de/∼knuesel McCullough, B.D., 1998. Assessing the reliability of statistical software: Part I. Amer. Statist. 52, 358–366. McCullough, B.D., 2000a. Experience with StRD: application and interpretation. Comput. Sci. Statist. 31, 16–21. McCullough, B.D., 2000b. The accuracy of Mathematica 4 as a statistical package. Comput. Statist. 15, 279–299. McCullough, B.D., Wilson, B., 1999. On the accuracy of statistical procedures in Microsoft Excel 97. Comput. Statist. Data Anal. 31, 27–37. McCullough, B.D., Wilson, B., 2002. On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. Comput. Statist. Data Anal. 40 (4), 713–721. National Institute of Standards and Technology, 2000. The statistical reference datasets. URL http://www.itl.nist.gov/div898/strd/ Park, S., Miller, K., 1988. Random number generators: good ones are hard to find. Commun. ACM 31, 1192– 1201. Press, W.H., Teulosky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, Cambridge. Research Systems Inc., 2003/2004. IDL versions 5.6 for Microsoft Windows and 6.0.1 for Linux x86 m32. URL http://www.rsinc.com Simon, S.D., Lesage, J.P., 1989. Assessing the accuracy of ANOVA calculations in statistical software. Comput. Statist. Data Anal. 8 (3), 325–332.

Suggest Documents