Optimizing the balance between false positive and false negative error probabilities of confirmatory methods for the detection of veterinary drug residuesâ .
Optimizing the balance between false positive and false negative error probabilities of confirmatory methods for the detection of veterinary drug residues† Waldo J. de Boer,*a Hilko van der Voet,a Wil G. de Ruig,b J. A. (Hans) van Rhijn,b Kevin M. Cooper,c D. Glenn Kennedy,c Raj K. P. Patel,d Sharon Porter,d Thea Reuvers,e Victoria Marcos,e Patricia Munoz, ˜ e Jaume Bosch,f Pilar Rodr´ıguezf and Josep M. f Grases a
Centre for Biometry Wageningen (CBW), P.O. Box 16, NL-6700 AA Wageningen, The Netherlands b DLO-State Institute for Quality Control of Agricultural Products, (RIKILT-DLO), P.O. Box 230, NL-6700 AE Wageningen, The Netherlands c Veterinary Sciences Division, Department of Agriculture for Northern Ireland, Stoney Road, Stormont, Belfast, UK BT4 3SD d Central Veterinary Laboratory (CVL-VLA), New Haw, Addlestone, Weybridge, Surrey, UK KT15 3NB e Centro Nacional de Alimentacion (CNAN-ISCIII), Carretesa Majadahonda a Pozuelo Km 2.2, 28220 Majadahonda, Madrid, Spain f Laboratori Agroalimentari de la Generalitat de Catalunya (LAGC), Apartat 12, 08340 Vilassar de Mar, Barcelona, Spain Received 9th September 1998, Accepted 1st December 1998
GC-MS data on veterinary drug residues in bovine urine are used for controlling the illegal practice of fattening cattle. According to current detection criteria, peak patterns of preferably four ions should agree within 10 or 20% from a corresponding standard pattern. These criteria are rigid, rather arbitrary and do not match daily practice. A new model, based on multivariate modeling of log peak abundance ratios, provides a theoretical basis for the identification of analytes and optimizes the balance between the avoidance of false positives and false negatives. The performance of the model is demonstrated on data provided by five laboratories, each supplying GC-MS measurements on the detection of clenbuterol, dienestrol and 19b-nortestosterone in urine. The proposed model shows a better performance than confirmation by using the current criteria and provides a statistical basis for inspection criteria in terms of error probabilities. European legislation prohibits the administration, for the purposes of growth-promotion, of b-agonists and other substances having a hormonal action to food-producing animals. Legislation is enforced through inspection procedures based on chemical analyses: urine samples are examined for drug residues and, depending on the outcome, classified as negative or positive. In both cases the conclusion can be true or false, yielding four categories: true positive, true negative, false positive and false negative results. This paper describes results from a project which aimed at estimating the probability of false positive (a) and false negative (b) results for given inspection procedures, as well as providing strategies for improving the criteria. Within the European Union (EU), Commission Decisions 93/256/EC1 and 93/257/EC2 establish criteria for the performance of screening and confirmatory analyses to be used in national residues control programmes. These criteria are used to establish whether a sample is positive or negative for a particular analyte. In this paper we call an analyte detected when not only the presence, but also the identity, is confirmed by the method of analysis. One of the most frequently used methods of confirmatory analysis is low-resolution gas chromatography-
† Presented at the Third International Symposium on Hormone and Veterinary Drug Residue Analysis, Bruges, Belgium, June 2–5, 1998.
mass spectrometry (GC-MS). For low-resolution GC-MS analysis the EU criteria are: (i) that the relative retention time should be within 0.5% of a standard obtained during the same run; (ii) that the intensity of preferably at least four diagnostic ions should be measured; and (iii) that the peak ratios of those ions or relative abundances should match those of a standard analyte preferably within a margin of ±10% for electron impact (EI) mode or ±20% for chemical ionisation (CI) mode. In addition to the European legislation, often the concentration is derived from the largest peak area by calibration using a standard, and only concentrations higher than a certain level are considered to provide sufficient evidence for confirmation. However, straightforward use of the proposed criteria is complicated for a number of reasons. The EU criteria are too rigid and do not match common laboratory practice. Even under highly controlled conditions laboratories often do not meet the test criteria for GC-MS measurements. EU criteria for mass spectrometry tend to be arbitrary, and aim to avoid false positive results.3 Generally, there is no mechanism to control the rate of false negatives. In practice, lower limits on the estimated concentration are imposed to avoid false positive results for low-level samples. In 1994 a monitoring study was started in three EU Member States to examine the analytical strategy and the quality of inspection procedures in order to offer a harmonized and costeffective system for the inspection of residues of veterinary Analyst, 1999, 124, 109–114
109
drugs. The research also aimed to provide strategies to quantify the risk of false positive and negative results in inspection procedures. The five laboratories provided GC-MS measurement data of growth-promoting agents in bovine urine with analyses spread out over a period of three years. The laboratories also provided screening, confirmation and inspection results in terms of numbers of negative and positive samples. The main aim was to collect a large set of data representing variability originating from different sources and different analytes that provided the basis for subsequent multivariate modeling. In this paper the emphasis is on the performance of a new method of analyzing GC-MS measurement data rather than on the theory behind it. A full description of the statistical model has been given elsewhere.4 In the present paper, a summary of the performance of the five laboratories for each of the three analytes is presented and discussed.
incurred urine samples. The incurred samples were prepared by the administration of growth-promoting doses of each of the anabolic agents to individual cattle. Clenbuterol was administered orally, whereas dienestrol and 19b-nortestosterone were administered by intramuscular injection. Urine samples were collected, and analyzed for drug concentration before being transported at 220 °C to a central point for the preparation of the test materials. The target analyte concentrations in the reconstituted test materials were chosen a priori to ensure that the samples containing the highest analyte concentration should be found positive by the participating laboratories in most cases. For clenbuterol, nominal concentrations of 0, 0.1, 0.2, 0.5, 1 and 2 mg kg21 were chosen, while for both dienestrol and 19bnortestosterone concentrations of 0, 0.2, 0.5, 1, 2 and 5 mg kg21 were chosen. Analysis of test materials
Test materials Five participating laboratories in the UK, Spain and The Netherlands provided data over three years. Each laboratory used its own apparatus and Standard Operating Procedures (SOPs) to analyze the samples. Three analytes were studied: clenbuterol, dienestrol and 19b-nortestosterone in two types of sample: spiked and incurred urine samples. For both sample types, the test materials were prepared centrally and lyophilized prior to distribution to the participating laboratories. All laboratories used known negative urine samples to reconstitute the samples prior to analysis. Laboratories were instructed to ensure that the urine samples chosen to reconstitute the test materials were representative of the population of urine samples encountered in their local residues control programmes. For each of the three analytes the samples were stratified into ten rounds, each of which consisted of ten series, each of which consisted of six samples. The samples (600 for each analyte) were distributed to the participating laboratories over a three year period. The samples were coded to ensure that the participating laboratories were unaware of the analyte concentration in any individual sample. Six rounds (1, 2, 4, 5, 7 and 8) used spiked urine samples, and four rounds (3, 6, 9 and 10) used Table 1
a
Methods of analysis Screening method
Confirmatory method
GC-MS peaks at m/za
Internal standard at m/z
Laboratory 1: Clenbuterol Dienestrol 19b-Nortestosterone
— — —
HPLC GC-MS CI GC-MS CI
— 381, 395, 410* 215, 256*, 331, 346
— — —
Laboratory 2: Clenbuterol Dienestrol 19b-Nortestosterone
ELISA RIA RIA
GC-MS EI GC-MS EI GC-MS EI
142, 331*, 333, 346 317, 395, 410*, 411 256*, 290, 221, 346
— — —
Laboratory 3: Clenbuterol Dienestrol 19b-Nortestosterone
ELISA GC-MS EI GC-MS EI
GC-MS EI GC-MS EI GC-MS EI
86*, 187, 243, 262, 264 244, 381, 395, 410* 287, 403, 418*, 419
72 435 435
Laboratory 4: Clenbuterol Dienestrol 19b-Nortestosterone
EIA — —
GC-MS EI GC-MS EI GC-MS EI
349*, 351; 391*, 393 179, 381, 395, 410* 215, 256*, 331, 346
355; 397 419 419
Laboratory 5: Clenbuterol Dienestrol 19b-Nortestosterone
ELISA — ELISA
GC-MS EI HR-GC-MS EI HR-GC-MS EI
243*, 285, 300 658 666
306 660 669
The base peak is marked with an asterisk.
110
The six samples that constituted a series were analyzed by each of the participating laboratories (usually) in a single analytical batch. Each batch of samples was accompanied by analytical standards, calibration or reference samples with known concentrations, as prescribed in the individual SOPs used by each of the laboratories. Samples were subjected to analysis exactly as if they had been official control samples. Test materials were in the majority of cases subjected to an initial screening test (see Table 1). These methods are generally inexpensive and rapid and are designed to prevent false negative results. However, unlike routine samples, all of the test materials were subjected to a confirmatory test. These analyses are expensive, more complicated and are designed to avoid false positive results. For confirmation, low-resolution GC-MS was widely used, with either EI or CI detection. One participant used HPLC for confirmatory analysis of clenbuterol (laboratory 1). Another participant used high-resolution GC-MS for the analysis of dienestrol and 19b-nortestosterone (laboratory 5), a technique not covered by current EU legislation.1 In the majority of cases abundances were recorded in EI mode. Most laboratories monitored several diagnostic ions, except for laboratory 5 for dienestrol and 19b-nortestosterone. Laboratory 4, clenbuterol, performed GC-MS using two different derivatives simultaneously and, accordingly, two internal standards (see Table 1).
Analyst, 1999, 124, 109–114
EU and laboratory-specific criteria
All calculations in this paper were performed by using the statistical package Genstat.6
It was found that most participating laboratories developed their own criteria on which to base their confirmatory step. For example, one approach incorporated the standard deviation associated with the ion ratio, which was calculated from repeated analysis of standards, in the calculation of the range of ion ratios that were accepted as indicative of the presence of the analyte. Adapted criteria differed widely between laboratories. Multivariate log ratio model Some of the indicated problems related to GC-MS analysis of drug residues may be solved by using a statistical approach. Van der Voet et al.4 proposed a multivariate detection model based on the primary abundance measurements. The essentials of the multivariate model are described in the Appendix. The idea behind the model is that deviations in peak ratios increase at low analyte concentrations. At high concentrations peak ratios stabilize, except for random deviations caused by measurement error, within-run variability, the presence of interfering compounds and other sources. The relative error of the peak ratios may be expressed by means of the relative standard deviation. This is equivalent to the standard deviation after a log transformation of the peak ratios. The variability of the log peak ratios was determined and a non-linear relationship between variability and abundance level of the analyte was incorporated in the decision as to whether samples were classified as being either positive or negative. The test relies on method and/or laboratory-specific circumstances represented by a tolerance parameter e, the value of which should be specified externally. In this paper, a robust distribution-free estimate of the standard deviation, based on the median absolute deviation,5 was used to estimate the dispersion of log ratios Qk at the investigated concentrations. The estimate is: sk = 0.67 3 medk | Qk 2 medkQk |
(1)
where med denotes the median. In this paper an inconfidence level (false positive rate) a = 0.01 was used throughout. After fitting the models, power curves were constructed as follows: for a range of peak abundances critical values of the test statistic were calculated from a non-central chi-squared distribution [see Appendix, eqn. (a.6)]. Under H1, corresponding probabilities 1 2 b (true positive rate) were derived from a central chi-squared distribution [eqn. (a.7)]. Calibration data The dataset was split into two parts and the first half, with 50 series as a maximum, was used to calibrate the model. Log peak ratios were calculated, taking the largest analyte peak as denominator peak. For each run reference values mk were calculated, taking the mean of the log ratios for the reference samples with concentrations > 1 mg kg21. Validation data The model was validated using the second half of the data. Standardized log ratio statistics were calculated as before. For each sample a multivariate test was performed. Results depend on the tolerance parameter e. At the present state, the value of e is unspecified. Therefore, a value of e was assessed by performing the multivariate test for a set of increasing e values: e was taken as the highest value giving no false positives for the blanks.
Results The number of series analyzed for each analyte within each laboratory ranged from 63 to 100. For laboratory 2, the first three rounds (30 series) of results for dienestrol and 19bnortestosterone were excluded from analysis because from round 4 onwards a different ion fragment was monitored. Laboratory 4 suffered from start-up problems in the determination of dienestrol and 19b-nortestosterone in the first two rounds. For all laboratories, further series were excluded for the following reasons: improbable abundances, e.g., negative or extremely high values, the occurrence of interchanged samples, missing samples, lost samples, failures, etc. Table 2 summarizes the results of the confirmatory analysis of each analyte reported by the laboratories and represents the false positive and true positive rates. Fig. 1 shows the operating characteristic (OC) curves for the confirmation results as delivered by the laboratories and, except for laboratory 1, clenbuterol, and laboratory 5, hormones, for the confirmation results after applying EU criteria. The figures in the plots are based on the number of validation samples, that is, roughly half the number of series summarized in Table 2. It is seen that performances differ widely between laboratories. Also seen is that some laboratories used their own criteria, whereas others applied strict EU criteria. It should be kept in mind that OC curves are not to be compared between laboratories because laboratories analyzed different samples (based on local urine matrix) and because of differences between laboratories in the application of EU criteria. Few false positive results were reported. In general, all laboratories reported false negative results even at the highest nominal concentrations, but results differed considerably. For clenbuterol, all laboratories performed similarly, with the exception of laboratory 2. For hormones, OC curves showed much more variability: at 1–5 mg kg21 laboratories 1 and 5 detected most positive samples, laboratories 2 and 4 failed to identify the presence of the analyte and laboratory 3 was in an intermediate position. Scatterplots per concentration level of log ratio variables Qk show the bivariate margins of the distribution for the set of samples. Ratios with a missing numerator or denominator peak in at least one of the ratios were replaced by arbitrary high or low values, respectively, and are also shown. Fig. 2 is an example, illustrating for one of the laboratory–analyte combinations that variability in the point cloud diminishes at the higher concentrations. Plots of log ratio variables Qk in all analyses showed a similar pattern. Laboratory 2, dienestrol, was the only exception where variability did not increase at lower concentration levels. Fig. 3 shows an example of the fitted relationships and empirical estimates for the standard deviation of the log ratio variables, based on median absolute deviations in each concentration group. For this example a region of constant relative variability in the ratios is reached at above 2 mg kg21. In general, the fitting of exponential curves caused no particular difficulties, except for laboratory 3, 19b-nortestosterone. Here, too many of the reported abundances were zero to allow the calculation of the distribution-free standard deviation from medians. For laboratory 2, dienestrol and 19b-nortestosterone, no exponential curve could be fitted for m/z ratios 217/410 and 290/256, respectively, and it was decided to exclude the ions at m/z 217 and 290 from the models. The application of the multivariate detection model to the validation data with specification limit e, based on calibration and confidence level 1 2 a = 0.99 for the confirmatory analyses, is presented in Fig. 4. The plots show that the Analyst, 1999, 124, 109–114
111
performance of the proposed model is better than confirmation results after applying EU criteria, except for laboratory 2, dienestrol. In general, more true positives are detected at all concentrations. This is illustrated for laboratory 5, where all real positive samples were found, that is, no false negative results at all. The plots for laboratory 3 and to a lesser extent laboratory 4, hormones, show a significant improvement of the new method: EU criteria failed whereas the multivariate model performed fairly well. Application of the multivariate model failed for laboratory 3, 19b-nortestosterone, owing to the presence of more than 50% zeroes for some of the peaks. No multiple peaks were available for laboratory 1, clenbuterol, and laboratory 5, hormones (Table 1). In Fig. 5, theoretical power (1 2 b) of the multivariate model as a function of abundance level is plotted. Power is the complement of the false negative rate. It measures how many real positives are indeed identified as being positive in the procedure. At low levels, powers are already high, except for laboratory 2, hormones. Fig. 5 can be used to derive a rough indication of what the minimum detectable level would be by dropping a perpendicular line at the start of each power curve to the x-axis. Samples with levels below the minimum detectable level will not be detected in a multivariate test. Note that Fig. 5 establishes the power as a function of peak abundance level, not as a function of concentration. In Fig. 5, the median abundance levels of the non-zero nominal concentration groups are indicated by tickmarks. For instance, for laboratory 2, 19bnortestosterone, the fourth tickmark indicates a nominal concentration of 2 mg kg21. The median abundance level of this concentration group is positioned slightly to the left of the point of minimum detection, meaning that at least 50% of the samples in this concentration group are not detectable.
strict use of EU criteria and adapted criteria. The high degree of deviating peak patterns produces poor results by applying 10% criteria, but laboratory 3 manages to produce very satisfactory results by applying its own criteria. Laboratory 2 reported for all analytes abundance values with many deviating values. Obviously, at the start of the project the confirmatory step was not working satisfactorily and needed optimization. However, within the framework of the project all laboratories were requested not to change their methods as prescribed in the SOPs. New methods and improved procedures were not incorporated during the process, because comparability of confirmation results was not the aim. Results presented should be considered from this perspective. Note that in Fig. 1 the OC curves for laboratory 2, hormones, are based on four peaks, whereas in Fig. 4 results are based on three peaks. Fig. 2 and 3 show the main features of the data that are used in the multivariate model and make clear why the model is working. Standardized log ratios have a zero expectation and stabilize at high concentration levels of the analyte. Given a specified value for e, high relative variability at low levels will prevent the analyte from being detected while at high levels peak ratios deviate less and samples are found positive for the presence of the analyte. Provided that condition (a.7) (see Appendix) is fulfilled, samples can be detected as positives. Conditional on e, a value can be derived for which it is no longer possible to detect the analyte for mean levels equal to or below that value. The additional condition acts as an appropriate ‘detection limit’ determined implicitly by the proposed procedure. Ideally, blank samples always have mean levels below this minimum detectable level. This means that for fixed e, laboratories can only lower their amount of false negatives results (b errors) by improving their precision. Laboratories with better precision will detect more positives at lower levels. Thus, minimum detectable levels vary between laboratories and/or analytes depending on their precision. The multivariate test for detection is governed by conditions (a.6) and (a.7). Provided that condition (a.7) is fulfilled, samples with high deviating peak patterns have high values of D2 and, consequently, are found negative. Setting specification limits e wider increases the non-centrality parameter d until condition (a.6) is fulfilled and samples are detected as positive outcomes.
Discussion One of the premises at the start was that EU criteria did not match reality. As an answer to that, each laboratory developed in-house criteria to cope with the problems. This presumption was found to be true and is demonstrated in Fig. 1. For example, laboratory 3 illustrates very clearly the difference between the
Table 2 Confirmatory analysis: numbers of positive (+) and negative (2) outcomes and fraction positive results (p) for clenbuterol, dienestrol and 19bnortestosterone Nominal concentration/ mg kg21
+
2
p
+
2
p
+
2
p
+
2
p
+
Clenbuterol: 0.0 0.1 0.2 0.5 1.0 2.0
2 5 23 60 82 88
88 86 66 32 10 3
0.02 0.05 0.26 0.65 0.89 0.97
1 1 4 12 23 40
99 99 96 88 77 60
0.01 0.01 0.04 0.12 0.23 0.40
0 3 11 32 72 85
93 89 82 61 21 8
0.00 0.03 0.12 0.34 0.77 0.91
6 10 25 60 82 86
89 85 70 34 13 8
0.06 0.11 0.26 0.64 0.86 0.91
0 0 21 67 85 86
86 86 65 19 1 0
0.00 0.00 0.24 0.70 0.99 1.00
Dienestrol: 0.0 0.2 0.5 1.0 2.0 5.0
1 3 20 70 93 87
95 93 76 25 3 0
0.01 0.03 0.21 0.74 0.97 1.00
1 0 1 2 11 12
62 63 62 61 52 41
0.02 0.00 0.02 0.03 0.17 0.23
2 12 21 39 51 59
93 84 74 57 45 37
0.02 0.13 0.22 0.41 0.53 0.61
0 0 0 0 0 0
78 78 78 78 77 78
0.00 0.00 0.00 0.00 0.00 0.00
0 0 3 89 99 90
99 100 97 11 0 0
0.00 0.00 0.03 0.89 1.00 1.00
19b-Nortestosterone: 0.0 2 0.2 3 0.5 0 1.0 27 2.0 73 5.0 93
96 95 98 71 25 5
0.02 0.03 0.00 0.28 0.75 0.95
0 0 1 2 2 5
69 69 68 67 67 64
0.00 0.00 0.01 0.03 0.03 0.07
2 8 16 24 44 70
94 86 80 72 52 28
0.02 0.08 0.17 0.25 0.46 0.71
0 0 3 6 14 37
73 73 69 67 58 36
0.00 0.00 0.04 0.08 0.23 0.51
0 11 89 97 98 98
98 87 9 1 0 0
0.00 0.11 0.89 0.99 1.00 1.00
112
Laboratory 1
Laboratory 2
Analyst, 1999, 124, 109–114
Laboratory 3
Laboratory 4
Laboratory 5 2
p
For values of d that are too low, condition (a.7) is never met and detection results will always be negative. This situation arises when (i) the dispersion is very high, e.g., laboratory 2 for all analytes, and/or (ii) the specification limits e are set too narrow. For laboratory 2, 19b-nortestosterone, the majority of the samples with nominal concentrations up to 2 mg kg21 are not detectable. For laboratory 5, all non-zero nominal concentrations are far above the minimum detectable level. For laboratory 2, strong deviating peak patterns prevent detection, whereas for laboratory 5 detection is governed by condition (a.6) only.
Practical application of the model The statistical model presented in this paper has been applied in the context of a research project, using calibration to obtain preliminary estimates of appropriate specification limits e. A legitimate question is whether and how it would be applicable in practice. Clearly, application of the model requires many samples with known analyte concentrations over a relevant range to be analyzed. Therefore, application is restricted to laboratories with a sufficient throughput of samples. A first impression may be that analyzing (say) 20–50 concentration series to obtain calibration data for the model imposes a very large amount of additional work on the laboratory. However, modern validated methods of analysis already include a substantial number of calibration and quality control (QC) samples in their procedure. Typically, each GC-MS run contains 1–2 blank matrix samples, 1–4 spiked matrix samples, and a calibration series of about five standards. It is our impression that a cost-effective integration is possible of calibration, traditional QC, and error rate QC as proposed in this paper. Already now, some laboratories prefer to calibrate using spiked matrix samples rather than chemical standards. It seems
Fig. 1 Detection of clenbuterol, dienestrol and 19b-nortestosterone in urine. Fraction positive confirmation results versus nominal concentrations after applying laboratory-specific criteria are represented by solid circles. Open circles represent confirmation results after applying EU criteria. Concentrations are 0, 0.1, 0.2, 0.5, 1 and 2 mg kg21 for clenbuterol and 0, 0.2, 0.5, 1, 2 and 5 mg kg21 for hormones.
Fig. 3 Laboratory 1, 19b-nortestosterone: standard deviation of standardized log ratios as a function of level. Level is the geometric mean of all available peak abundances internally standardized for laboratories 3, 4 and 5. Empirical estimates are based on median absolute deviations in each concentration level group (0, 0.2, 0.5, 1, 2 and 5 mg kg21). Circles represent m/z ratio 215/256, triangles m/z ratio 331/256 and squares m/z ratio 346/256.
Fig. 2 Laboratory 1, 19b-nortestosterone: scatterplots per concentration level of standardized log m/z ratios 215/256 and 346/256. Open circles represent samples with a missing denominator or numerator peak in at least one of the ratios.
Fig. 4 Detection of clenbuterol, dienestrol and 19b-nortestosterone in urine. Fraction positive confirmation results versus nominal concentrations. Results of the multivariate model are represented by solid circles. Open circles represent confirmation results after applying EU criteria. Concentrations are 0, 0.1, 0.2, 0.5, 1 and 2 mg kg21 for clenbuterol and 0, 0.2, 0.5, 1, 2 and 5 mg kg21 for hormones. Values of e are depicted in the plots.
Analyst, 1999, 124, 109–114
113
useful to lower the concentrations in part of the regular QC samples in order to ascertain the performance at low levels. A further point needing practical elaboration is the inclusion of matrix variability in QC procedures. In current routine QC procedures this variability is simply ignored, with unknown consequences. The procedure employed in this paper uses local blank urine samples from individual animals. A subject for future research is how matrix variability can be practically incorporated in routine QC. Thus, we believe that, with a change of protocol with regard to standard and QC samples, data can be generated from routine practice to allow application of the model. An appropriate value for e should be found to describe the assumption that the spectral signature of interfering components will at least differ more from that of the analyte than eTS21e (see Appendix). Obviously, this requires experience with a range of interfering compounds, as well as a more practical implementation of the procedure. In addition, expert opinions will still be needed to assess appropriate values for e. Note that calibration of the model and specifications of e are specific for a protocol as operated in one laboratory. Laboratories with high precision should have different values than laboratories that invest more, e.g., in the quality of the clean-up procedures. In our view, harmonized regulations should be concerned with appropriate values of the allowable probabilities of false positive and false negative results a and b (the latter at a specified concentration).
and q = p 2 1. The multivariate model is based on standardized log ratios (relative peak abundances); Qk = yk 2 mk
where mk is the corresponding log ratio of a standard sample in the same analytical batch (possibly averaged over several standards and/or validation samples). The hypotheses for a test for multivariate detection are: H0 : E(Q)T S21 E(Q) ! eT S21 e H1 : E(Q)T S21 E(Q) < eT S21 e where e is a vector having q identical elements containing the tolerance parameter e, E(Q) is the expected value of Q, Q is the vector (Q1…Qq) and S is the variance–covariance matrix of Q. The diagonal and off-diagonal elements contain estimates of the variance s2k and covariance parameters, respectively. The standard deviation is estimated by fitting simple exponential curves: sk = a + b exp (2kx) + e
Appendix Analyte peaks are corrected for the internal standard, if available, giving A1…Ap. Log peak ratios yk = ln(Ak/Ap) are calculated, where Ap is the largest analyte peak and k = 1…q
(a.2)
where sk is the standard deviation of all samples with the same nominal concentration, a, b and k are parameters, x is the median over these samples of the geometric mean of p peaks (corrected for the internal standard) and e represents error. Covariances are estimated as sisjr, where r is the pooled within-group correlation. The geometric mean for p peak abundances is calculated as:
Conclusions Multivariate modeling offers statistically based criteria that may replace the 10 or 20% EU criteria for detection of veterinary drug residues. The performance of the proposed model with specification limits based on calibration is much better than results obtained after applying EU criteria. The multivariate model diminishes the number of false positives and false negatives in the majority of the investigated data. Assessing appropriate values of e remains a task for future work, requiring both chemical expertise and practical experience with the model.
(a.1)
x = (A1·A2·…·Ap)1/p
(a.3)
Define the test statistic: D2 = QT S21 Q
(a.4)
where D2, the squared Mahalanobis distance, has a non-central chi-squared distribution7 with non-centrality parameter: d = eT S21 e
(a.5)
The analyte is detected in a test with confidence level (1 2 a) if: D2 < c2q,d(a)
(a.6)
d > c2q(1 2 a)
(a.7)
provided that
c2
c2
where q,d (a) and q(1 2 a) are critical values of the noncentral and central chi-squared distribution, respectively, with q degrees of freedom, non-centrality parameter d and significance level a.
Acknowledgments This work was financially supported by the European Commission under contract AIR-3 CT 94 1415.
References 1 2 3 4 Fig. 5 Detection of clenbuterol, dienestrol and 19b-nortestosterone in urine. Theoretical power of the multivariate model as a function of level. Level is the geometric mean of all available peak abundances internally standardized for laboratories 3, 4 and 5. Ticks indicate median levels for nominal concentrations 0.1, 0.2, 0.5, 1 and 2 mg kg21 for clenbuterol and 0.2, 0.5, 1, 2 and 5 mg kg21 for hormones. Levels of laboratories 1 and 2 are multiplied by 1025. The horizontal line shown represents a = 0.01.
114
Analyst, 1999, 124, 109–114
5 6 7
Commission Decision of 14 April 1993 (93/256/EEC), Off. J. Eur. Comm., 1993, L118, 64. Commission Decision of 15 April 1993 (93/257/EEC), Off. J. Eur. Comm., 1993, L118, 75. W. G. De Ruig, R. W. Stephany and G. Dijkstra, J. Assoc. Off. Anal. Chem., 1989, 72, 487. H. Van der Voet, W. J. De Boer, W. G. De Ruig and J. A. Van Rhijn, J. Chemometr., 1998, 12, 279. F. R. Hampel, J. Am. Stat. Assoc., 1974, 69, 383. Genstat 5 Committee, Genstat 5 Release 3 Reference Manual, Clarendon Press, Oxford, 1993. E. S. Pearson, Biometrika, 1959, 46, 364.
Paper 8/07051B