Accred Qual Assur (2009) 14:185–192 DOI 10.1007/s00769-009-0490-2
GENERAL PAPER
Non-parametric estimation of reference intervals in small non-Gaussian sample sets Johan Bjerner Æ Elvar Theodorsson Æ Eivind Hovig Æ Anders Kallner
Received: 24 October 2008 / Accepted: 16 January 2009 / Published online: 12 February 2009 Ó Springer-Verlag 2009
Abstract This study aimed at validating common bootstrap algorithms for reference interval calculation.We simulated 1500 random sets of 50–120 results originating from eight different statistical distributions. In total, 97.5 percentile reference limits were estimated from bootstrapping 5000 replicates, with confidence limits obtained by: (a) normal, (b) from standard error, (c) bootstrap percentile (as in RefVal) (d) BCa, (e) basic, or (f) student methods. Reference interval estimates obtained with ordinary bootstrapping and confidence intervals by percentile method were accurate for distributions close to normality and devoid of outliers, but not for log-normal distributions with
outliers. Outlier removal and transformation to normality improved reference interval estimation, and the basic method was superior in such cases. In conclusions, if the neighborhood of the relevant percentile contains nonnormally distributed results, bootstrapping fails. The distribution of bootstrap estimates should be plotted, and a non-normal distribution should warrant transformation or outlier removal.
Electronic supplementary material The online version of this article (doi:10.1007/s00769-009-0490-2) contains supplementary material, which is available to authorized users.
Introduction
J. Bjerner (&) Department of Medical Biochemistry, Rikshospitalet Medical Center, Oslo, Norway e-mail:
[email protected] J. Bjerner Dr. Fu¨rst Medical Laboratory, Søren Bulls vei 25, 1051 Oslo, Norway E. Theodorsson IKE/Clinical Chemistry, Linko¨ping University Hospital, Linko¨ping, Sweden E. Hovig Bioinformatics Core Facility, Institute of Cancer Research, Norwegian Radium Hospital, Rikshospitalet University Hospital, Oslo, Norway A. Kallner Department of Clinical Chemistry, Karolinska University Hospital, Stockholm, Sweden
Keywords Reference intervals Bootstrap Re-sampling Algorithm Non-parametric Percentile Confidence intervals Gaussian Distribution
In medical biochemistry, a reported level of a biochemical quantity is usually accompanied by a reference interval. Biological reference intervals are conventionally defined in laboratory medicine as covering the central 95% of a reference distribution composed of analyte values obtained by measurement of samples collected from reference individuals selected randomly from the reference population [1]. The interest in obtaining accurate reference limits for various biochemical quantities seems surging, and is, e.g., reflected in an increasing number of manuscripts on the subject submitted to the Scandinavian Journal of Clinical and Laboratory Investigation, where three of the authors have been involved with the editorial handling of manuscripts. When the journal during the summer of 2007 received two manuscripts containing relatively few observations, but with conspicuously narrow confidence limits calculated by using the non-parametric bootstrap routine in RefVal, we decided to make a detailed investigation of the
123
186
nonparametric algorithms used for calculating the reference intervals including their confidence intervals. Reference interval studies generally adhere to IFCC recommendations, recently summarized [1]. Amongst the most important recommendations is sample size of at least 121. However, IUPAC has published a supplement to the IFCC recommendations opening up for reference interval calculations even with sample sizes down to 50 [2]. Calculations generally are performed using dedicated software such as RefVal [3] or CBstat [4], or software implementing similar algorithms. The initial versions of RefVal were based on a parametric method employing a two-stage transformation to a Gaussian distribution. More recent versions include the additional option of a non-parametric bootstrap [5]. The bootstrap estimator for an upper 97.5 percentile reference limit employed in RefVal is:
Accred Qual Assur (2009) 14:185–192
and as the data-sets from the previous validation studies of the bootstrap procedures [6, 10] all were of less skew than the data-sets we have seen in a study of reference intervals of tumor markers [8], we expected this to be a major shortcoming of the percentile method of obtaining confidence limits in the ‘‘RefVal’’ and ‘‘CBstat’’ applications. In the present study, we introduce two different approaches for reducing the bias of the calculated reference intervals and their associated confidence intervals. We first tested the effect of reducing data skewness prior to calculations by either transforming the data to normality or by removing outliers, and then tested the effect of applying methods of confidence interval calculations less sensitive for skewness. We thus included several alternative methods to calculate confidence intervals, where some of those methods were not available at the time when RefVal and CBstat were written.
O½k þ ðO½k þ 1 O½kÞð0:975ðN þ 1Þ kÞ; where N is the population size and k the largest integer smaller than 0.975(N?1) and O an ordered vector of observations. This estimator will subsequently be referred to as ‘‘Solberg’’. Linnet proposed a slight modification of the ‘‘Solberg’’ estimator which was employed by CBstat [6]: O½k þ ðO½k þ 1 O½kÞðð0:975N þ 0:5Þ kÞ; where N is population size and k the largest integer smaller than 0.975(N?0.5) and O an ordered vector of observations. This modified estimator will subsequently be referred to as ‘‘Linnet’’ in the text. For a review of bootstrap procedures in medical biochemistry, see Henderson [7]. Linnet validated the non-parametric bootstrap estimators ‘‘Solberg’’ and ‘‘Linnet’’ and found both estimators to return fairly unbiased and efficient estimates of an upper 97.5 percentile [6]. However, the validation was performed by simulating unimodal distributions only. The distributions of analyte concentrations in healthy subjects are however not always unimodal, exemplified by the distribution of CA19.9 which is approximately log-normal, but has a small ‘‘contamination’’ of outliers [8]. We have previously noted that such distributions cause problems in the parametric version of RefVal, which may be conveniently remedied [9]. In the present study, we have thus in addition to the test distribution from Linnet and the test distributions from Horn [10] included two log-normal distributions with contaminations added, similar to the distributions of CA19.9 and thyroglobulin. Later in the study we will show that the proposed re-sampling methods of Linnet and Solberg do not return robust estimates for those distributions. Confidence intervals in RefVal or CBstat are estimated by the percentile method. Linnet noted that the confidence intervals obtained by the percentile method turned out to narrow for small sample sets [6]. This bias of the confidence limits will increase with increasing skewness of the data-set,
123
Materials and methods Computer simulation Ordinary bootstrapping, and calculation of confidence intervals using the percentile method are convenient and may be performed by means of astonishingly few lines of programming code. By ‘‘ordinary bootstrapping’’, we will in this paper refer to bootstrapping without importance sampling or tilting. In the current study, we also wanted to explore some more advanced features of bootstrapping, and we thus implemented the functions of the bootstrap library boot in R [11], based on the original functions in S-plus written by Canty [12]. For a review of the functions included in this library and other functions available for calculating confidence intervals for bootstrapping, the review of Carpenter and Bithell [13] is recommended. We included both the two similar algorithms proposed by Solberg [5] and Linnet [6] for returning a bootstrap estimate of the 97.5% reference limit. Bootstrap estimates were obtained with 5000 re-samplings for each set. Box– Cox-transformation was employed to render normally distributed data sets, and removal of outliers followed the use of Turkey’s fence as described previously [9]. All resamplings were performed with ordinary re-sampling. We initially also tried exponential tilting [14], but found exponential tilting to be slower and slightly more inaccurate, combined with occasional failures probably due to the creation of systems of linear equations lacking solution. We thus discarded exponential tilting from the study. Twosided 90% confidence intervals were obtained by simply applying the options in the R library boot: (a) normal, (b) basic, (c) from standard error (in most regards as basic but basic is calculated directly from the library whereas
Accred Qual Assur (2009) 14:185–192
standard error version is calculated indirectly), (d) bootstrap percentile (as in RefVal and CBstat) (e) BCa, and (f) student. The first five options are straightforward applications of the R functions, but the student method requires the user to express the standard error of bootstrap estimate of each sample set on a closed form. This is difficult and we chose to obtain an estimate of the standard error by simple rankit transformation and extracting observations close to the 10 and 90 percentiles. This is an unbiased but inefficient method, thus providing good estimates of confidence limits in large sample sets, but for small sample sets the inefficiency results in a too large uncertainty of the estimated confidence limits. By finding a more efficient method of calculating the standard error, the student method may certainly be improved. We chose to test eight different distributions, specified below. From each test distribution, we created 1500 random datasets of sizes 50, 60, 70, 80, 90, 100, and 120. For each dataset, upper 97.5 percentile reference limits were calculated using 5000 resamplings of the same size as the original dataset. The bootstrap algorithms tested were ‘‘Solberg’’ and ‘‘Linnet’’. For each algorithm we also tested transformation towards normality (Box–Cox transformation), removal of outliers and the combination of both measures. Following removal of outliers, one may choose between adjusting the reference limits or not ajusting, and we kept both options in the study. To investigate the reference limits obtained by bootstrapping, we compared those limits to the true limits obtained directly from the test distributions. We have plotted both the root mean squared error (RMSE), as in [6], as a measure of the combined bias and imprecision, and mean bias alone. We then investigated the calculated confidence intervals for those reference limits. We first obtained the ‘‘true’’ 90% confidence limits from each distribution and size of data set by extracting the 5th and 95th percentiles of the reference limits of all 1500 data sets. We then extracted the 200 most central datasets, and compared the calculated confidence limits from those sets with the ‘‘true’’ confidence limits by calculating the RMSE and the mean bias. Simulations and plots were all made by scripts in R. Scripts can be obtained from the corresponding author for free use, such as validating the findings of this study. Distributions We simulated random sets from eight different distributions: 1. 2.
a log-normal distribution (‘‘LogNormal1’’), having a meanlog of 3 and a sdlog of 0.5 a log-normal distribution (‘‘LogNormal2’’), having a meanlog of 3 and a sdlog of 0.5, and containing a 0.5% contamination having a meanlog of 6 and a sdlog of 0.5
187
3.
4.
5.
a log-normal distribution (‘‘LogNormal3’’), having a meanlog of 3 and a sdlog of 0.5, and containing a 1.0% contamination having a meanlog of 6 and a sdlog of 0.5 a mixture of two equally sized Gaussian distributions (‘‘MixedNormal’’), one having a meanlog of 2.7 and a sdlog of 0.5 and the other having a meanlog of 3.3 and a sdlog of 0.5 the skew distribution from Linnet [6] (‘‘SkewedLinnet’’), here set to have a meanlog of approximately 3 and a sdlog of approximately 0.5
and finally 6.–8. representing chi-squared distributions with 4 (‘‘ChiHorn2’’), 7 (‘‘ChiHorn3’’) and 10 (‘‘ChiHorn4’’) degrees of freedom, adapted from Horn [10]. All were set to have a meanlog of approximately 3 and a sdlog of approximately 0.5. We also tried a chi-squared distribution with 1 degree of freedom, but the generation of random sample sets having this distribution was unfortunately very slow, so this distribution was discarded.
Results Computer simulation—bias and RMSE of reference limits The obtained RMSE after bootstrapping for each distribution and sample size are plotted in Fig. 1a. The four distributions (‘‘SkewedLinnet’’,’’ChiHorn2’’,’’ChiHorn3’’,’’ChiHorn4’’) previously used for testing purposes exhibited only small RMSE for all tested distributions. The log-normal distributions having contaminations, ‘‘LogNormal2’’ and ‘‘LogNormal3’’, exhibited high RMSE, with the proposed algorithms of ‘‘Solberg’’ and ‘‘Linnet’’ with no prior removal of outliers or transformation performing significantly worst. The ‘‘LogNormal1’’ and ‘‘MixedNormal’’ distributions displayed a medium performance, with the proposed algorithms of ‘‘Solberg’’ and ‘‘Linnet’’performing significantly best. To separate the effect of the algorithm and the effect of the bootstrapping procedure, we calculated the difference in RMSE after and before, and plotted these differences in Figure 1b (included in the electronical supplement only). A negative value indicates that the bootstrap procedure returned more accurate reference intervals, whereas a positive value indicates that the bootstrap procedure made it worse. For all the adjusted algorithms, including transformation or removal of outliers, bootstrapping contributes little to the returned reference limits. For the proposed algorithms of ‘‘Solberg’’and ‘‘Linnet’’ without these adjustments, bootstrapping significantly alters the returned reference limits. For small sample sets, bootstrapping reduces reference limit bias for all distributions. When the
123
188
Accred Qual Assur (2009) 14:185–192
70
80
90 100
60
70
80
90 100
5 4 3 2 1 0
Mean Residual Square Error
4 3 2 50
120
50
60
70
80
90 100
SkewedLinnet
ChiHorn2
70
80
90 100
120
60
70
80
90 100
ChiHorn3
ChiHorn4
60
70
80
90 100
120
120
0.6 0.4 50
60
70
80
90 100
120
Number of Observations
0.2
0.4
0.6
Ordinary Ordinary + Tukey Transform + Tukey Transform Ordinary + Tukey with corr Transform + Tukey with corr
Linnet − black Solberg − red
0.0
0.2
0.4
0.6
Mean Residual Square Error
Number of Observations
0.2
Mean Residual Square Error 50
Number of Observations
0 .0
0.8 0.6 0.4 0.2 0.0
1.2 0.8
60
120
0.8
MixedNormal 1.0
Number of Observations
Mean Residual Square Error
Number of Observations
0.0 50
1
120
Number of Observations
0.4 50
LogNormal3
0
Mean Residual Square Error
0.8 0.4
60
0.0
Mean Residual Square Error
50
Mean Residual Square Error
LogNormal2
0.0
Mean Residual Square Error
LogNormal1
50
60
Number of Observations
70
80
90 100
120
Number of Observations
Fig. 1a Root mean squared errors (RMSE) from reference limits obtained from all tested distributions. The two log-normal distributions with contaminations (LogNormal2 and LogNormal3) display
high RMSE for the two algorithms currently most commonly applied in medical biochemistry
number of observations increases in the ‘‘LogNormal2’’ and ‘‘LogNormal3’’ distributions, bootstrapping ceases to reduce bias, but returns estimates having larger bias than before bootstrapping. The likely cause of this phenomenon will be discussed later.
here due to the skewness of the tested distributions. Since the tested distributions were skewed, the returned nonskew confidence interval of the normal approximation underestimated the calculated lower confidence limit and overestimated the calculated upper confidence limit. Studentized confidence limits also failed by showing both a large bias and a large RMSE. The probable reason behind this failure was the difficulties of expressing the standard error of each bootstrap estimate. If this problem is properly addressed, the bias and RMSE of studentized confidence limits are likely to decrease. Since the normal approximation and the studentized confidence limits here clearly were inferior to other alternatives, they are not further discussed or illustrated.
Computer simulation—bias and RMSE of confidence limits Only four of the suggested methods of calculating confidence intervals displayed properties suited for routine use. We will thus start with a short dismissal of the two substandard methods. The normal approximation was clearly not suited for the purpose of confidence interval estimation
123
Accred Qual Assur (2009) 14:185–192
189
70
80
90 100
120
60
70
80
90 100
50.0
120
50
60
70
80
90 100
SkewedLinnet Percentile
ChiHorn2 Percentile
70
80
90 100
120
60
70
80
90 100
ChiHorn3 Percentile
ChiHorn4 Percentile
80
90 100
120
60
70
80
90 100
120
Number of Observations
5.0
Ordinary Ordinary + Tukey Transform + Tukey Transform Ordinary + Tukey with corr Transform + Tukey with corr
Linnet − black Solberg − red
0.1 70
50
0.5
Mean RSE Lower Limit
5.0 0.5
120
50.0
Number of Observations
Number of Observations
5.0 0.1
50
Number of Observations
60
0.5
Mean RSE Lower Limit
5.0 0.1
0.5
Mean RSE Lower Limit
5.0 0.5
60
120
50.0
MixedNormal Percentile 50.0
Number of Observations
50.0
Number of Observations
0.1 50
5.0 0.1
50
Number of Observations
50.0
50
0.5
Mean RSE Lower Limit
50.0 5.0 0.1
60
0.1
Mean RSE Lower Limit
50
Mean RSE Lower Limit
LogNormal3 Percentile
0.5
0.5
5.0
Mean RSE Lower Limit
50.0
LogNormal2 Percentile
0.1
Mean RSE Lower Limit
LogNormal1 Percentile
50
60
70
80
90 100
120
Number of Observations
Fig. 4a The percentile method of confidence interval calculation is displayed by the the RMSE of lower confidence limits
The four potential candidate methods for calculating confidence limits can be arranged into two groups. The percentile and BCa methods have many similarities, with the second one being an updated and corrected version of the first one. The BCa and percentile methods, however, returned almost identical results, and we thus show only the results of the percentile method, as this is the method most frequently used in medical biochemistry. The two methods originating from the standard error of the bootstrap estimate, the basic, and the confidence limit indirectly calculated from the standard error also returned similar, but not identical results. We here chose to show only the results of the basic method, as this is the standard output of the library. The percentile method here returns lower confidence limits having low bias (Figure 2a, included in the electronical supplement only) and low RMSE (Fig. 4a) for all
the distributions but the log-normal distributions having contaminations. Upper confidence limits for the percentile methods are sensitive for the actual observations corresponding to the bootstrap percentiles of the distributions, and percentile methods thus exhibits a larger bias (Figure 3a, included in the electronical supplement only) and a higher RMSE (Fig. 5a), especially for the log-normal distributions having contaminations. Transforming the distributions prior to calculations does not change the outcome since the percentile method is not directly dependent on the distribution. Outlier removal may partly correct for a skew distribution. Confidence limits obtained with the basic method display similar bias (Figures 2b and 3b, included in the electronical supplement only) and RMSE (Figures 4b and 5b, included in the electronical supplement only) compared to the confidence intervals obtained from
123
190
Accred Qual Assur (2009) 14:185–192
60
70
80
90 100
60
70
80
90 100
50.0
120
50
60
70
80
90 100
ChiHorn2 Percentile
70
80
90 100
120
60
70
80
90 100
ChiHorn3 Percentile
ChiHorn4 Percentile
80
90 100
120
60
70
80
90 100
120
Number of Observations
5.0
Ordinary Ordinary + Tukey Transform + Tukey Transform Ordinary + Tukey with corr Transform + Tukey with corr
Linnet − black Solberg − red
0.1 70
50
0.5
Mean RSE Upper Limit
5.0 0.5
120
50.0
Number of Observations
Number of Observations
5.0 0.1
50
Number of Observations
60
0.5
Mean RSE Upper Limit
5.0 0.1
0.5
Mean RSE Upper Limit
5.0 0.5
60
120
50.0
SkewedLinnet Percentile 50.0
MixedNormal Percentile 50.0
Number of Observations
0.1 50
5.0 0.1
50
Number of Observations
50.0
50
0.5
Mean RSE Upper Limit
50.0 5.0 0.5
120
Number of Observations
0.1
Mean RSE Upper Limit
50
Mean RSE Upper Limit
LogNormal3 Percentile
0.1
0.5
5. 0
Mean RSE Upper Limit
50.0
LogNormal2 Percentile
0.1
Mean RSE Upper Limit
LogNormal1 Percentile
50
60
70
80
90 100
120
Number of Observations
Fig. 5a The percentile method of confidence interval calculation is displayed by the the RMSE of upper confidence limits
the percentile method, when the tested distribution is close to normality and there are no outliers. With skew distributions and outlier present, the basic method is dependent on transformation and outlier removal to obtain an accurate estimate of both upper and lower confidence limits. Following these procedures, the basic method will return accurate confidence limits also under these circumstances.
Discussion Present status Today, non-parametric reference intervals are commonly calculated using the algorithms of ‘‘Solberg’’ and ‘‘Linnet’’
123
or similar ones. By applying a bootstrapping procedure, one wishes to obtain (1) a small but significant reduction in the imprecision of the reference level estimate and (2) a confidence interval for the obtained estimate. Reference limits in medical biochemistry are calculated using ordinary bootstrapping and confidence intervals are calculated using the percentile method. According to the current study, for the straightforward distributions ‘‘ChiHorn2’’, ‘‘ChiHorn3’’, ‘‘ChiHorn4’’, ‘‘SkewedLinnet’’,’’MixedNormal’’, and ‘‘LogNormal1’’, this yields approximately correct reference limits. However, the reduction of the imprecision following the bootstrap procedure generally is generally small, and the gain of the bootstrapping procedure is mostly by providing a confidence interval for the estimate. For the log-normal distributions with contaminations ‘‘LogNormal2’’ and
Accred Qual Assur (2009) 14:185–192
‘‘LogNormal3’’ bootstrapping grossly alters results, sometimes reducing and sometimes increasing bias. By transformation or outlier removal the bootstrap procedure generally will return less-biased reference limits. Confidence intervals obtained with the percentile method will be accurate for all distributions except ‘‘LogNormal2’’ and ‘‘LogNormal3’’. Transformation or outlier removal will here fail to provide accurate confidence limits with the percentile method since this method is independent of the shape of the distribution of the bootstrap estimate. For ‘‘LogNormal2’’ and ‘‘LogNormal3’’, accurate confidence intervals may be obtained by applying the basic method. Why are then the distributions ‘‘LogNormal2’’ and ‘‘LogNormal3’’ so problematic? The obvious answer is the inherent nature of the bootstrap procedure. Bootstrapping reduces error by calculating a reference limit not only from one result corresponding to the reference limit, but also from results surrounding the result corresponding to the reference limit. If we in the surroundings have outliers, the outliers will affect the bootstrap estimate, and the bootstrap estimate will be worse than the original data. In the ‘‘LogNormal2’’ and ‘‘LogNormal3’’ distributions, the presences of outliers are only 0.5 and 1%, respectively, but also small fractions of outliers may thus be deleterious for the reliability of the bootstrap estimate. Suggestions for more accurate bootstrapping In most of the tested distributions, ordinary bootstrapping using the algorithms of ‘‘Linnet’’ and ‘‘Solberg’’ together with the percentile method for confidence intervals returns accurate estimations. However, for the case of a log-normal distribution with contaminations, transformation and outlier removal improves the outcome, and the basic method should be chosen for confidence interval estimation. The outcome of bootstrapping is, contrary to popular belief, dependent on the underlying distribution and the possible presence of outliers. A skew distribution and present outliers will both result in a skew and irregular distribution of the bootstrap estimate. A plot of the distribution of the bootstrap estimator is a valuable tool when judging the reliability of a bootstrap procedure [15]. We thus advise the user to always plot the distribution of the bootstrap estimator. If this distribution is skewed or irregular, transformation and removal of outliers should be tried and confidence intervals calculated using the basic method. We must stress that the important measure here is not to achieve a perfect normality of the results, but more normal results return a more normally distributed bootstrap estimate. A second advice is for the user to compare results from different methods for confidence interval estimations. If there is agreement between the percentile method and the confidence interval obtained from the
191
standard error, this indicates a distribution sufficiently close to normality and confidence intervals can surely be trusted. For outlier removal, we have here used Tukey fences. This is certainly not the optimal procedure for outlier removal, since the outcome of outlier removal using Tukey fences will depend on sample size. Modern procedures exist to identify multiple outliers at a specified probability [16], and we may include such a procedure in a future application. We have implemented all of the algorithms of the current study as a web service using R/Rpad [17], currently located at www.sjcli.org for free use. The bootstrap estimates are plotted to help the user choose and judge the reliability of the chosen bootstrap procedure.
Conclusion Bootstrapping for calculation of reference intervals in small sample sets is possible. However, the distribution of the bootstrap estimate must be plotted and carefully investigated. If the distribution is irregular or skew, data should be transformed towards normality and outliers removed. In these cases, the basic method should be employed for confidence interval calculations. The gains of bootstrapping are small, and the first choice should be a parametric method, being more efficient. Bootstrapping should be tried first when parametric methods have failed. Acknowledgment Thanks to the staff at Rikshospitalet-Radiumhospitalet, who provided us access to their computers for simulations during the Easter vacation!
References 1. Rustad P, Felding P, Lahti A (2004) Proposal for guidelines to establish common biological reference intervals in large geographical areas for biochemical quantities measured frequently in serum and plasma. Clin Chem Lab Med 42:783–791. doi: 10.1515/CCLM.2004.131 2. Poulsen OM, Holst E, Christensen JM (1997) Calculation, application of coverage intervals for biological reference values. A supplement to the approved IFCC recommendation (1987) on the theory of reference values. Pure Appl Chem 69:1601–1611. doi:10.1351/pac199769071601 3. Solberg HE (1995) RefVal: a program implementing the recommendations of the International Federation of Clinical Chemistry on the statistical treatment of reference values. Comput Methods Prog Bio 48:247–256. doi:10.1016/0169-2607(95) 01697-X 4. Linnet K (2008) CBstat: a program for statistical analysis in clinical biochemistry. http://www.cbstat.com/ (Accessed June 2008) 5. Solberg HE (2004) The IFCC recommendation on estimation of reference intervals. The RefVal program. Clin Chem Lab Med 42:710–714. doi:10.1515/CCLM.2004.121
123
192 6. Linnet K (2000) Nonparametric estimation of reference intervals by simple and bootstrapbased procedures. Clin Chem 46:867–869 7. Henderson AR (2005) The bootstrap: a technique for data-driven statistics. Using computerintensive analyses to explore experimental data. Clin Chim Acta 359:1–26. doi:10.1016/j.cccn.2005. 04.002 8. Bjerner J, Høgetveit A, Akselberg K, et al (2008) Upper reference limits for carcinoembryonic antigen (CEA), CA125, MUC1, alfafoeto-protein (AFP), neuron specific enolase (NSE) and CA19.9 from the NORIP study. Scand J Clin Lab Invest 68:703–713. doi: 10.1080/00365510802126836 9. Bjerner J (2007) Age-dependent biochemical quantities: an approach for calculating reference intervals. Scand J Clin Lab Invest 67:707–722. doi:10.1080/00365510701342070 10. Horn PS, Pesce AJ, Copeland BE (1998) A robust approach to reference interval estimation and evaluation. Clin Chem 44:622–631 11. R Development Core Team (2007). R: A language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria. http://www.rproject-org/ Accessed June 2008
123
Accred Qual Assur (2009) 14:185–192 12. Canty AJ (1998) An S-plus library for resampling methods. Computing Science and Statistics. In: Proceedings of the 30th symposium on the interface, pp 236–241 13. Carpenter J, Bithell J (1999) Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med 19:1141–1164. doi:10.1002/(SICI)1097-0258(20000515) 19:9\1141::AID-SIM479[3.0.CO;2-F 14. Efron B (1981) Nonparametric standard errors and confidence intervals (with Discussion). Can J Stat 9:139–172. doi:10.2307/ 3314608 15. Hesterberg T, Monaghan S, Moore DS, Clipston A, Epstein R (2003) Bootstrap methods and permutation tests. The practice of business statistics, WH Freeman and Company, San Fransisco 16. Schwertman NC, Silva RD (2007) Identifying outliers with sequential fences. Comput Stat Data Anal 51:3800–3810. doi: 10.1016/j.csda.2006.01.019 17. Short T (2008) Rpad: interactive, web-based analysis package, webpage and GUI designer. http://www.r-project.org/ and http:// www.rpad.org/rpad. Accessed June 2008