Chemometrics:Measurement Reliability - Semantic Scholar

CLIN. CHEM. 34/12, 2494-2498 (1988)

Chemometrics:Measurement Reliability Kicab Casta#{241}eda-M#{233}ndez Applying the principles of chemometrics can lead to development of a powerful, simple, comprehensive system for characterizing measurement reliability of analytical processes. The system described here consists of a format for specifications, definitions, and decision rules for evaluating and comparing analytical processes. Two cases illustrate its advantages over the use of precision, accuracy, total error, and regression statistics for such evaluations. “Chemometrics is defined as the chemical discipline that uses mathematical and statistical methods (a) to design or select optimal measurement procedures and experiments and (b) to provide maximum chemical information by analyzing chemical data. Chemometrics is the chemical discipline that uses mathematical and statistical methods for the obtention [sic] in the optimal way of relevant information on material systems” (1). The purpose of an analytical process is to measure the concentration of an analyte in a medium. The only relevant question concerning the purpose of measurement is: How well or reliably does it measure? Applying chemometrics to performance characterization of analytical processes leads to a new, comprehensive measurement reliability characterization system. Necessary and sufficient characteristics and decision rules for evaluation and comparison are logically deduced from the definitions. Two examples illustrate the power, simplicity, and advantages of this reliability characterization system over current statistical methods that are based on the use of precision, accuracy, total error, and regression statistics (2-8). Reliability An analytical process that is perfectly reliable in its measurement always produces measurements exactly equal to the actual concentration. In practice, frequency F2. Example: The frequencies with which glucose meters 1-3 produce values within E0 = [0.95, 1.10] at C = 1.00 g/L are F1 = 0.96, F2 = 0.88, and F3 = 0.92. 1fF0 = 0.95, meter 1 is reliable but meters 2-3 are not (Rule 1). Meter 1 is more reliable than meter 3, which is more reliable than meter 2 (Rule 2).

Characterization Unique and complete reliability characterization of an analytical process need refer only to different values of C, E, and F. There are three cases:(a) the minimum F for a given E and range of C, (b) all C values for a given F and E, (c) the minimum E for a given F and range of C. Case a is described by Definition 1 and Rules 1 and 2 extended to all C values of interest. Cases b and c are defined below by Definitions 3 and 4, respectively. Definition 3: A range of reliability is an interval of concentratwns at which an analytical process has acceptable reliability.3 Example: Glucose meter 1 isreliablein the range [CL, Cu] = [0.40,4.00], if F F0 = 0.95 for 0.40 C 4.00 g/L. 1fF is 0, the analyte is present in the medium. Definition 5: The allowable error for presence is E0 = (0, ). The allowable error for absence is E0 = [0, 01. Note: Brackets, [or], denote inclusion, whereas parentheses, (or), denote exclusion of interval endpoints; e.g., (3,4] means 4 is included but 3 is not. Definition 6: Negativity of an analyte is the reliability of =

3The range is a single concentration value when CL Cu C. Symmetry is required for uniqueness: allowableerror E [EL, Eu] is symmetrical about C when C EL E C. =

=

-

=

-

recognizing its absence. Positivity of an analyte is the reliability of recognizing its presence.8 Rules 1 and 2 apply to negativity and positivity because they merely define reliability for special cases of E0. Example: At C = 0.15 g/L, X > 0 for 99% and 95% of the time for meters 1 and 2, respectively. Meter 2 has less positivity than meter 1 (Rule 2). Definition 7: Limit of detection is the smallest concentration whose presence can be recognized.6 Rule 5: C0 is the limit of detection given F0 and E0 = (0, cc) if, and only if, F F0. Rule 6: One analytical process has a lower limit of detection than a second analytical process for a given F0 if, and only if, CD1 0.95 = F0, so both are acceptable. Rule 2 addresses comparability: F2 > F1, so meter 2 is superior to meter 1, relative to the medical criteria. At a 95% confidence level (22), F2 = 0.97 whereas F1 = 0.92: meter 2 is still acceptable but meter 1 is not. Figure 2 provides other relevant information. These graphs show at which concentrations it would be essential to gather more data to address reliability. For example, meter 2 does not give readings that are linearly related to concentration. The curvature indicates that concentrations at both

S

0

,(0

0

0

00

0

0

15

2

20

a

So

100

200

150

250

300

RSJE0SCS

S50

400

450

600

460

000

VALUE

20

15-

101

0

0

00

00

6

o

0

000

0 00

0

:

F p.

S A S N C S

500

0 #{176}:

0

0

00

0

:

x0

0 -10

.7 400

‘-7 400 ,00’ 20’1

300

0

60

100

160

200 REPEAENCS

M

e

300’

0

A S

300

360

400

VALUE

Fig.2. % differencevs reference with ± 15% allowable error limits: (top) meter1 (simulateddata), (bottom) meter 2 (simulated data)

xwxx,( 250’

260

0

0

S U 200’ T

ends of the range would fail specifications before other concentrations would. Contriving samples at 0.40-0.50 g/L and 3.80-4.00 g/L would provide information about the acceptability at those

150

100’

-z

concentrations.

SO

o

so

100

150

200 250 MEFENENCE

300

300

400

::40i 360

460

600

VALUE

/‘ /7

-

0

a50

0

From the graph one can conclude that, if

meter 2 is reliable at C = 0.40-0.50 g/L and at C = 3.804.00 g/L, then it is reliable at all other concentrations. The curvature also indicates that the range of reliability may not be extendable for meter 2. A similar analysis for meter 1 shows that the low concentrations should be tested. One other useful and relevant piece of information is the resolution of the two analytical processes. Figure 3 shows a plot of percentiles of the absolute percent deviations from the reference method for each measurement from each method. Various point estimates of ER can be read from this graph. For example, the 95th percentiles (F0 = 0.95) show that Em = C ± 10% and ER1 = C ± 14%. Hence, meter 2 would still be reliable at E0 = C ± 10% but meter 1 would not. In general, if p is the percentile, then the corresponding ER is the resolution for F0 = p. Critique

o

so

100

160

200 R5PEfl51’ICS

260

300

360

400

450

600

VALUE

Fig. 1. Measurement vs reference,with identity line: (top) meter 1 (simulateddata), (bottom) meter 2 (simulated data) 2496 CLINICAL CHEMISTRY, Vol. 34, No. 12, 1988

Complete regression analysis as well as precision (within, between, and total for runs and laboratories) and accuracy analyses could have been used in Case 1, because there were several levels of concentration. The same ambiguities or incorrect conclusions would have occurred as in Case 2.

A

a a

0

0

L

L

a 0 0

0

p. p. S

p. S

S N

S N

S

S

N

0

10

20

30

0

40

-

.TEN

50 1

50 0

-

UE

70

50

20

100

2

Ag. 3. Percentilesof absolute % difference with 15% allowableerror limit:(X) meter 1, (0) meter2 (simulateddata) The reason precision, accuracy, and total error fail to correctly characterize measurement reliability is explained in reference 9. The same failing applies to regression statistics, because slope and intercept are accuracy statistics and standard error and correlation coefficient (23) are precision statistics. Evidently, linearity per se or relative to the identity line is neither good nor bad nor informative with respect to reliability. The reliability method is always applicable, unlike regression statistics, because it requires fewer assumptions. It does not require either an assumption of homoscedasticity (5, 21) or additional data to estimate the variabilities at several representative concentrations to achieve homoscedasticity through data transfer (6-8). Linearity, which would introduce another error in reliability estimates, as evidenced by the hyperbolic curves for confidence and prediction intervals of a regression line (5-8, 21), is not required. Thus, meter 2 can be evaluated by this method but not by the methods in references 5-8 and 21. From a practical perspective, the fundamental, relevant question concerning performance is: For a particular purpose, how reliably does an analytical process produce useful results? The three components of measurement reliability (Definition 1) are included in this question: purpose (concentration), usefulness (allowable error), reliability (frequency). The answer must include these components, and need not include others. By the principles of chemometrics, Definitions 1-8 and Rules 1-7 provide the simplest and most direct answer. Current (2-8) statistical approaches to analytical process characterization do not directly or adequately address measurement reliability or acceptability of analytical processes. Reliability is a probability, as is frequency. So frequency is a measure of reliability. The performance characteristics slope, intercept, correlation coefficient, standard error of the line, imprecision of the slope and intercept, precision and accuracy, total error, and statistical confidence8 definitions of limit of detection and quantification (24-26) can not be measures of reliability. They are not probabilities. The purpose of statistics is to provide estimates of reliability and resolution, not the acceptability criteria. These 8Statistjmi confidence is not a probability (15), becauseit is not consistent with statistical axioms. Probabilities are additive, for example, whereas statistical confidences are not.

estimates are then compared with practical specifications for decision on acceptability or to other estimates for decisions on selection. Finally, it seems odd that, among professionals, complicated statistical analyses are used to evaluate medical devices-e.g., references 2-8--but when it comes to telling the rest of the world how good we are, we use a simpler, more direct, easier to understand, and exact analysis: how often the device, laboratory personnel, etc. produce reliable and useful results, e.g., reference 17. Conclusions Experts in the application area of the analytical process should take the responsibility to publicize allowable errors, minimal frequencies, and ranges of reliabilities. For example, physicians do not make decisions by knowing what the precision and accuracy of their analytical processes (27-34) are but on whether a result is within normal limits or if a change has occurred (29-33). These decisions are based on the probability (frequency) of an acceptable result within an allowable error. Measurement reliability is the essential performance characteristic of analytical processes. Current statistical approaches to characterization of analytical processes either do not include measurement reliability or do not address it adequately. The unified, comprehensive chemometric system described here does address measurement reliability. It is founded on a practical and essential definition of reliability providing only relevant information. Two graphs (Figures 2 and 3) can show that information. The major advantages of a single,comprehensivemethodology for evaluation and characterization of analytical processes are: consistent characterization, evaluation, and comparison; reduced or eliminated confusion about concepts, language, definitions, and notation; and a single concept that is applicable for all cases. Benefits are directness, exactness, simplicity, and clarity. The reliability characterization system developed here, based on simple definitions, which logically imply the necessary and sufficient characterization and decision rules for evaluation and comparison of analytical processes, fills that void, has those advantages, and provides those benefits.

References 1. Frank IE, Kowalski BR. Chemometrics. Anal Chem 1982;54:232R-43R. 2. Westgard JO. Precision and accuracy: conceptsand assessment by method evaluation testing [Review]. Crit Rev Clin Lab Sci 1981;13:283-330.

3. Westgard JO, Barry PL. Cost-effectivequality control: managing the quality and productivity of analytical processes.Washington, DC: AACC Press, 1986. 4. Informationfor authors.Clin Chem 1988;34:1-4. 5. National Committee for Clinical Laboratory Standards. Standards, Guidelines,Reports EP2-T, EP3-P, EP4-T, EP5-T, EP6-P, EPI-P, EP9-P, EP1O-P.Villanova, PA: NCCLS, 1979-1986. 6. Gerbet D, RichardotP, AugetJ-L, et al. New statistical approach in biochemical method-comparison studiesusingWestlake’sprocedure, and its applicationto continuous-flow, centrifugal analysis, andmulti-layer film analysistechniques. Clin Chem 1983;29:11316. 7. Passing H, Bablok W. A new biometrical procedure for testing the quality of measurements from two different analytical methods: applicationof linear regressionprocedures for methodcomparison studies in clinical chemistry, part I. J Clin Chem Clin Biochem 1983;21:709-20. CLINICALCHEMISTRY,Vol. 34, No. 12, 1988 2497

8. Bookbinder MJ, Panosian KJ. Using the coefficientof correlation in method-comparison studies. Clin Chem 1987;33:1170-6. 9. Castafieda-M#{233}ndez K. Medical utility frequency. Clin Chem

1987;31:221-2. 10. Galen RS,Gambino SR.Beyond normality: the predictive value and efficiencyof medical diagnosis. New York: John Wiley & Sons, 1975. 11. TaylorJK. Qualityassuranceof chemical measurements. Chelsea,MI: Lewis Publishers, 1987. 12. ButtnerJ, Berth R, Boutwell JH, Broughton PMG. International Federation of Clinical Chemistry Committee on Standards. Provisional recommendations on quality control in clinical chemistry. I. General principles and terminology. Clin Chem 1976;22:53240.

13. Pharmaceutical Manufacturers Association, Quality Control Section, Compendial Assay Committee. Current concepts for the validation of compendial assays. Pharmacopeial Forum l986March-April:1Z41-5. 14. ACS Committee on Environmental Improvement. Guidelines for data acquisition and data quality evaluation in environmental chemistry. Anal Chem 1980;52:2242-9. 15. Woodroofe M. Probability with applications. New York: McGraw-Hffl

mc, 1975. 16. Habig RL. It’s time to take the initiative 1987;33:1682.

[Editorial].

ChinChem

17. Fringe CS, White RM, Danielle JB. Status of drugs-of-abuse testing in urine: an AACC study. Clin Chem 1987;33:1683-6. 18. Ranch J, Dahlmeier B, Wolf M. In-hospital bedside glucose moththring a program that works. Clin Diabetes 1986;4:73-7. 19. Gifford-Jorgensen RA, Borchert J, Hassanein H, et al. Comparison of five glucose meters for self-monitoring of blood glucose by diabetic patients. Diabetes Care 1986;9:70-6. 20. Anscombe FJ. Graphs in statistical analysis. Statistician 197327:17-21.

2498 CLINICAL CHEMISTRY, Vol. 34, No. 12, 1988

21. Gather CC, Carey RN. Laboratory statistics. In: Kaplan LA, Peace AJ eds. Clinical chemistry. St. Louis: CV Mosby Co., 1984, Chapter 17. 22. Hollander M, Wolfe DA. Nonparainetric statistical methods. New York: John Wiley and Sons, 1973. 23. Rodgers JL, Nicewander WA. Thirteen ways to look at the correlation coefficient.Am Statistician 1988;42:59-66. 24. Long GL, Winefordner JD. Limits of detection. Anal Chem 1983;55:712A-21A. 25. Rodbard D. Statistical estimation of the minimal detectable concentration(“sensitivity”) for radioligand assays.Anal Biochem 1978;90:1-12. 26. Glaser JA, ForestDL, McKee GD, Quave SA, BuddeWL. Trace analyses for wastewaters. Environ Sci Technol 1981;15:1426-35. 27. Barnett RN. Medical significance of laboratory results. Am J Chin Pathol 1968;50:671-6. 28. Skendzel 12. How physiciansuse laboratory tests. J Am Med Assoc 1978;239:1077-80. 29. Barret AE, Cameron SJ, Fraser CG. Penberthy LA, Shand KL. A clinical view of analytical goals in clinical biochemistry. J Clin Pathol 1979;32:893-6.

30. Link K, Centor R, Buchsbaurn D, WitherspoonJ. Why physicians don’t pursue abnormal laboratory tests: an investigation of hypercalcernia and the follow-up of abnormal test results. Hum Pathol 1984;15:75-8. 31. Skendzel 12, Barnett RN, Platt R. Medically useful criteria for analytical performance of laboratory tests. Am J Clin Pathol 1985;83:200-5. 32. Kassirer JP, Gerry GA. Clinical problem-solving a behavioral analysis. Ann Intern Med 1978;89:245-55. 33. Statland B. Decision levels.In: Clinical decision levels for lab tests. Oradell, NJ: Medical Economics Books, 1983. 34. Siest G, Henny J, Schiele F, et al., eds.Interpretation of clinical laboratory tests. Davis, CA: BiomedicalPublications,1985.