Detection of heterogeneous wheat samples using near infrared

0 downloads 0 Views 282KB Size Report
An homogeneity check using Near Infrared (NIR) Spectroscopy has been developed for agricultural and agro-food samples prepared for Proficiency Testing Schemes (PTS) at Bipea. To evaluate the homogeneity among ... stressed that the aim of the control is to check that the same “degree of homogeneity” is found in all ...
M.E. Lafargue et al., J. Near Infrared Spectrosc. 11, 109–121 (2003)

109

Detection of Heterogeneous Wheat Samples M.E. Lafargue et al., J. Near Infrared Spectrosc. 11, 109–121 (2003)

Detection of heterogeneous wheat samples using near infrared spectroscopy Marie E. Lafargue,a Max Feinberg,b Jean-Jacques Daudinc and Douglas N. Rutledgeb a

Bipea (Bureau Interprofessionnel d’Études Analytiques), 6/14 avenue Louis Roche, 92230 Gennevilliers, France. E-mail: [email protected] b

Institut National Agronomique Paris-Grignon, UMR INAPG-INRA Ingéniérie Analytique pour la Qualité des Aliments (IAQA), 16 rue Claude Bernard, 75005 Paris, France c

Institut National Agronomique Paris-Grignon, UMR INAPG-INRA Biométrie, 16 rue Claude Bernard, 75005 Paris, France

An homogeneity check using Near Infrared (NIR) Spectroscopy has been developed for agricultural and agro-food samples prepared for Proficiency Testing Schemes (PTS) at Bipea. To evaluate the homogeneity among samples, the procedure involves a comparison of NIR spectra, the determination of global homogeneity criteria and the use of control charts. To study the performance of the method, “heterogeneous” wheat samples were artificially prepared and several methods tested to detect them. Wheat samples were analysed by diffuse reflection in the 10,500 to 3800 cm–1 spectral range (or 952 to 2631 nm). To detect the “heterogeneous” samples among the “homogeneous” samples, a classical outlier detection (Grubbs’ test) applied to quantitative results obtained with NIR calibrations (protein and moisture contents) was first performed. A second, qualitative, approach was also used, based on the processing of the spectra with: the calculations of “standard deviation spectra” and the “difference” spectra; the application of the Grubbs’ test directly on spectra; and the computations of Euclidian distances. The results confirm that the NIR technique is well adapted and outstanding for checking homogeneity among samples prepared for PTS and that the processing of the whole spectra is more efficient than simply the processing of the protein and moisture contents (obtained from NIR calibrations). “Heterogeneous” wheat samples only slightly different in their protein and moisture contents compared to “Homogeneous” wheat samples could be detected (deviation among samples down to 0.03% of protein content and 0.08% of moisture content). Keywords: homogeneity, near infrared spectroscopy, outlier detection, Grubbs’ test, standard deviation spectra, difference spectra, distances

Introduction The Bureau Interprofessionnel d’Etudes Analytiques (Bipea) is a Proficiency Testing Scheme (PTS) organiser for the agricultural, food and environmental fields according to the ISO guide 43-11 and counts more than 900 laboratories partici-

pating in more than 40 schemes. One of the more important requirements of a PT organiser is to provide homogenous samples. In that way, one can be sure that if a laboratory has a result different from those of the other laboratories, its error can be attributed to its analysis method and not to the particular sample. The control is described in a new standard in prepa-

© NIR Publications 2003, ISSN 0967-0335

110

ration, ISO 13528 draft norm:2 at least 10 samples taken from the total population during the preparation must be analysed in replicates in order to determine the “homogeneity targets”. The results of the determinations permit a conclusion to be drawn concerning the homogeneity. To answer quality requirements and to improve its services, Bipea has developed a control procedure using Near Infrared (NIR) Spectroscopy.3 To evaluate the homogeneity among samples, instead of using NIR spectroscopy as usually done to determine chemical parameters such as protein content, the control involves a direct comparison of the NIR spectra and the determination of global homogeneity criteria. The method of control carried out at Bipea permits a rapid and easy monitoring of the performance of the sample preparation. It should be stressed that the aim of the control is to check that the same “degree of homogeneity” is found in all the samples. For example, if the samples are made from a mixture of four different products (corn, barley, wheat and rapeseed) in equal proportions in the initial batch (25%), at the end of the homogenisation and division steps, each individual prepared sample must contain 25% of each product. The aim of this work is to study the performance of the homogeneity control based on NIR spectroscopy by introducing “heterogeneous” wheat samples into a set of homogeneous samples. From two sets of wheat samples, heterogeneous samples were prepared to simulate errors of preparation and were analysed by NIR spectroscopy. To detect the “heterogeneous” samples among the “homogeneous” samples, a classical outlier detection (Grubbs’ test) applied to quantitative results obtained with NIR calibrations (protein and moisture contents) was first performed. A second, qualitative, approach was also used, based on the processing of the spectra with: the calculations of “standard deviation spectra” and “difference” spectra; the application of the Grubbs’ test directly on spectra; and the computations of Euclidian distances.

Materials and methods Wheat samples Two kinds of wheat were used to prepare the mixtures for the study: common wheat n°1 (Crousty va-

Detection of Heterogeneous Wheat Samples

riety) and common wheat n°2 (Meunier variety), chosen for their protein contents, 10.55% for n°1 and 11.84% for n°2, in order to prepare, by mixing them, samples “heterogeneous” for their protein content. Concerning the moisture content, wheat n°1 is at 14.28% and wheat n°2 is at 13.52%. All these values are the reference values taken from the PTS results. The preparation of the samples was carried out with a rifles divisor, a device that permits a sample to be homogenised and divided into two identical sub-samples (from a 1 kg sample, two steps of division are necessary to obtain two 0.250 kg sub-samples for the NIR analysis). “Heterogeneous” samples were prepared from wheat n°1 by adding given masses of wheat n°2. This contamination by wheat n°2 varies between 0% and 100%. Table 1 presents the composition of the “heterogeneous” samples and their theoretical protein and moisture contents, based on the reference values. The variations in the protein and moisture contents caused by the addition of wheat n°2 also figure in the table. By convention, samples of wheat n°1 with 0% of wheat n°2 are called the “homogeneous” samples in the study and the samples with the various levels of wheat n°2 are the “heterogeneous” samples. The wheat samples were used to reproduce conditions of homogeneity control as carried out at Bipea using the protocol described in the introduction: sets of 10 samples analysed with duplicates by NIR spectroscopy and decision as to the homogeneity among samples based on statistical analysis of spectral data.

Spectroscopic analysis and spectral data pre-treatments The NIR measurements were performed with a Bruker Vector 22N/C spectrometer equipped with an external integrating sphere module consisting of: a quartz window with a rotating device, an integrating sphere where the diffusely reflected light is collected, homogenised and detected, and a gold standard as external reference. The wheat sub-samples are analysed directly (without grinding) by diffuse reflection in the 10,500–3800 cm–1 spectral range (or 952 to 2631 nm) using a 9 cm diameter quartz cell. The integrating module allows in particular the acquisition of NIR spectra from inhomogeneous samples such as wheat because rotating the sample

M.E. Lafargue et al., J. Near Infrared Spectrosc. 11, 109–121 (2003)

111

Table 1. Composition of the samples with the proportion in percentage of wheat n°1 and wheat n°2 and their theoretical protein and moisture contents.

% wheat n°1 % wheat n°2 Protein content (%) Deviation

Moisture content (%)

Deviation

100

0

10.55

0.00

14.28

0.00

98

2

10.58

0.03

14.26

–0.02

97

3

10.59

0.04

14.26

–0.02

95

5

10.61

0.06

14.24

–0.04

90

10

10.68

0.13

14.20

–0.08

80

20

10.81

0.26

14.13

–0.15

70

30

10.94

0.39

14.05

–0.23

60

40

11.07

0.52

13.98

–0.30

50

50

11.20

0.65

13.90

–0.38

40

60

11.32

0.77

13.82

–0.46

20

80

11.58

1.03

13.67

–0.61

on the quartz cell increases the measurement area and time-averages the signal. The resolution is fixed at 8 cm–1 and each spectrum acquired is the average of 32 scans of the sub-sample (time of the measurement equal to about 30 s). The spectra are recorded using the OPUS acquisition software (Bruker). All sub-samples (duplicates) were analysed in random order under repeatable conditions (same operator, same spectrometer, same quartz cell, short delay between analyses). It is usual when processing NIR spectra to perform mathematical pre-treatments. Actually, the spectra are affected by both the concentration of the chemical constituents and the physical properties of the analysed product (particle size and distribution). These latter properties account for the majority of the variance among spectra while the variance due to chemical composition is considered as small. It is possible to perform mathematical pre-treatments to reduce the effect of scatter and enhance the contribution of the chemical composition. In our application, the aim is to detect the “heterogeneous” samples with respect to both their chemical composition and their granularity. That is why both pre-treated and

raw spectra were processed in the study. The pretreatments used for the calibrations were Multiplicative Scattering Correction (MSC) and first order Derivative whereas Standard Normal Variate (SNV) was used for the global approach. Berntsson et al.4 studied binary powder mixtures and chose to apply MSC (Multiplicative Scattering Correction) to reduce the effect of scatter. They were able to show that most of the variation in predicted content within each sample is due to heterogeneity in respect to content. The MSC method, originally developed by Geladi et al.5, involves the calculation of two correction factors for each spectrum by linear regression of the spectrum onto the mean spectrum. Each spectrum is corrected by the two factors, noted offset and slope, by subtraction of the offset followed by division by the slope. Another pre-treatment widely applied for NIR spectra is SNV (Standard Normal Variate) developed by Barnes et al.6 Garrido-Varo et al.7 used SNV pre-treatment to extract relevant information from the spectra of agro-food products. SNV is an operation of normalisation of each spectrum (the mean of the absorbancies is subtracted from each absorbance

112

in the spectrum, which is then divided by the standard deviation) to reduce the variations of intensity of the spectra. Finally, first and second order derivatives can also be performed on NIR spectra. The calculation is done using the Savitsky–Golay algorithm.8

Data processing The NIR technique is widely used for the quality control of cereals and their derived products through the determinations of physico-chemical parameters such as protein content, and the classification of the cereals according to their quality and variety.9–12 Detection of the “heterogeneous” samples can be done by the determination of chemical criteria after calibrations and the testing of the predicted values by an outlier detection method, such as the Grubbs test. The detection of the “heterogeneous” samples can also be performed using the global spectra, following the principle of the homogeneity check developed at Bipea where the homogeneity is assessed by direct comparisons of spectra. Therefore, in this work, both a quantitative approach using outlier detection on predicted composition values, and a global approach based on the processing of the global spectra were used to detect the “heterogeneous” samples in order to evaluate the performance of the NIR technique. The OPUS (Bruker) and JMP (version 4, SAS Institute) software were used to analyse the data. Quantitative approach Calibrations: The Partial Least Squares (PLS) models of calibrations were obtained by crossvalidation.13,14 Various pre-treatments were compared and the Multiplicative Scatter Correction (MSC) was chosen for the protein calibration and the first order derivative + MSC for the moisture calibration. According to Berntsson et al.,15 after MSC pretreatment, NIR spectra mainly contain information about the chemical composition. It appears that the first derivative pre-treatment allows prediction results to be improved for the moisture calibration. The calibration samples used came from the PTS schemes of Bipea (about 40 samples for the protein determination and 60 for the moisture determination). The reference values are very reliable because they were taken from the PTS results (robust mean of

Detection of Heterogeneous Wheat Samples

the results of reference laboratories using reference methods). It should be noted that what is important for homogeneity control is not the accuracy (trueness) of the results but their precision. Both characteristics were studied with the use of reference materials and precision studies. The results of these studies show that for these characteristics, the performance of the NIR technique is satisfactory (nearly equivalent to the performance of the reference method). The characteristics of the calibrations used in the study are the following: for the protein content, range of the calibration from 9.0% to 15.3%, number of factors equal to 10, R2 (coefficient of determination) equal to 0.96; for moisture content, range of the calibration from 11.5% to 18.6%, number of factors equal to 10 and R2 equal to 0.99. The PLS models established with these two data sets were then used in all subsequent homogeneity tests in the paper. Grubbs’ test: Grubbs’ test is a robust outlier detection method described by Grubbs et al.16 and recommended in the ISO 5725-2 standard.17 It tests for outlier values (inferior and superior values) by calculating Grubbs’ statistics and comparing them to a table of critical values. If the statistic calculated for the tested value is lower or equal to its 1% critical value, the value is accepted. On the contrary, if it is greater than its 1% critical value, it is considered as an outlier value. The Grubbs’ statistics is calculated according to the following procedure: to test if the higher value (Max) is an outlier value, the Sup Grubbs’ value = (Max – Mean) / Standard Deviation; to test if the lower value (Min) is an outlier value, the Inf Grubbs’ value = (Mean – Min) / Standard Deviation. The Grubbs’ test was applied on the protein and moisture contents obtained with the NIR calibrations in order to detect the “heterogeneous” samples in the sets of samples. Furthermore, the effect of having more than one “heterogeneous” sample in the set of spectra on the test’s result was studied by varying the number of “heterogeneous” samples. Global approach Basic treatments: Two basics treatments were applied to the spectra: calculation of the Standard Deviation (SD) spectra and use of the difference spectra to emphasise the zones of variability.

M.E. Lafargue et al., J. Near Infrared Spectrosc. 11, 109–121 (2003)

One of the simplest operations to study the variability of a collection of spectra is to calculate the SD spectra. The SD spectra were calculated from a set of 10 “homogeneous” spectra, then from a set of nine “homogeneous” spectra and one “heterogeneous” spectrum (2%, 3%, 5% …., 80% of wheat n°2), in order to see the effect on the variability. Difference spectra calculated by selection of two samples with different compositions are a useful tool to identify spectral bands and to visualise zones of maximal variation between two families of samples. Garrido-Varo et al.7 calculated difference spectra to extract information on agro-food products and noted that the extracted information is more relevant after pre-treatment of the spectra. Grubbs’ test: The Grubbs’ test was also performed directly on the NIR spectra, instead of applying the Grubbs’ test on quantitative results as previously described, it was performed directly on the absorbancies. This means that the Grubbs’ test was repeated as many times as there are wavenumbers, that is 1738 for the wheat spectra. For each test, the inferior and the superior values were tested. As it is important in our case to avoid false negative (heterogeneous samples being considered homogeneous), the effect of multiple testing is not taken into account. Also, as the Grubbs’ test is applied here to points in a signal, it is possible by examining the plots of the values, to distinguish random outlying values from physico-chemically based outliers. Distances: From the set of 10 spectra, distances between each spectrum and the mean spectrum of the set was calculated. The maximal distance (Dmax), corresponding to the spectrum the most different from the nine others, was noted for each set. Ten sets of 10 spectra were created by: the first set consisting of 10 spectra of “homogeneous” samples corresponding to wheat n°1 samples (“homogeneous”), then nine sets of nine spectra of “homogeneous” samples plus one “heterogeneous” sample containing from 2% to 80% of wheat n°2. Furthermore, the effect on the distances when more than one “heterogeneous” sample is added to the set was studied by adding two, three, ... to nine “heterogeneous” samples with the same level of wheat n°2. Two kinds of distances were calculated: the Euclidian distance and the Mahalanobis distance but in this paper only the results based on Euclidian dis-

113

tances are presented. The effect of the presence of “heterogeneous” samples in the set of samples can be followed according to the % of the contamination by wheat n°2. The Euclidian distance between each spectrum and the mean spectrum was calculated according to the following formula: D2(i,m) = Σ[yij – ymj]2, computed for each series of 10 control samples (with y equal to the absorbance). Dmax is the maximal distance (Dmax = max d2[i,m]). Mark and Tunnell18 have proposed classifying an unknown sample by measuring the Mahalanobis distance between the sample and each of the known groups of products, as it is a global measure of the spectral similarity of the unknown sample with each of the groups. In this work, the Euclidian distance is used in the same way, to try to detect “heterogeneous” samples. The Euclidian distances were calculated for each set of 10 spectra and the maximal Euclidian distances were noted.

Results Two sub-samples were prepared for each “heterogeneous” or “homogeneous” sample and were analysed by NIR. The mean of the two sub-sample spectra was calculated and all the statistical processing was done on the mean spectra.

Spectra Figures 1 and 2 show the spectra obtained for the “heterogeneous” and the “homogeneous” samples without pre-treatment (raw spectra) and after SNV pre-treatment. As expected, the pre-treatment clearly reduces the effect of variations in the global intensity of the spectra.

Quantitative approach Determination of the protein and moisture contents The protein and the moisture contents of the samples were determined. This permits a check that the preparation of the “heterogeneous” samples is satisfactory: the differences between the observed and the theoretical values are small with RMSEP (root mean error of prediction) equal to 0.17 for the protein content and to 0.15 for the moisture content. In Figures 3 and 4, the relation between the protein and

114

Figure 1. Raw spectra (without pre-treatment) of 10 “heterogeneous” (2%, 3%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 80% and 100% of wheat n°2) samples and one “homogeneous” sample (0% of wheat n°2).

Detection of Heterogeneous Wheat Samples

Figure 2. SNV spectra (SNV pre-treatment) of 10 “heterogeneous” (2%, 3%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 80% and 100% of wheat n°2) samples and one homogeneous sample (0% of wheat n°2).

Figure 3. Protein content versus percentage of wheat n°2 in the “heterogeneous” samples.

Figure 4. Moisture content versus percentage of wheat n°2 in the “heterogeneous” samples.

M.E. Lafargue et al., J. Near Infrared Spectrosc. 11, 109–121 (2003)

the moisture contents and the percentage of wheat n°2 in the “heterogeneous” samples are presented. As expected, for the protein content, the relation is positive while for the moisture content it is negative. Both relations are linear but the fit for moisture content (coefficient of determination R2 = 0.99) is better than for protein content (coefficient of determination R2 = 0.88). This can be explained by the larger number of calibration samples used for the moisture model and also by the higher sensitivity of the NIR technique for moisture determination.

115

Grubbs’ test Grubbs’ test was performed successively on the protein and moisture content results, testing the highest and lowest values in the set of 10 “homogeneous” samples, then in the set of nine “homogeneous” and one “heterogeneous” samples (2%, 3%, …80%). For the protein content, only the highest Grubbs’ value was calculated because the more the

“heterogeneous” samples contain wheat n°2, the more their protein content is high. On the contrary, for the moisture content, the lowest Grubbs’ value was calculated. The critical values found in the statistical table depend on the number of samples p. In our case, with p equal to 10, the critical value is equal 2.482 for the 1% probability, which means that if the calculated Grubbs statistics is superior to 2.482, the value tested can be considered as an outlier. The results are shown in Table 2 for the test on the protein content and in Table 3 on the moisture content. For the protein content, one “heterogeneous” sample is detected from the sample containing 40% of wheat n°2 (corresponding to an increase of 0.52% in the protein content) and for the moisture content from the sample containing 10% of wheat n°2 (corresponding to an increase of 0.08 % in the moisture content). The detection of two “heterogeneous” samples is not possible for the moisture content but feasible for the protein content at the levels of 50%,

Table 2. Results of the Grubbs’ test for the protein content. Observed value calculated by testing the protein content of each “heterogeneous” sample with critical value equal to 2.482 (1% probability).

Table 3. Results of Grubbs’ test for the moisture content. Observed value calculated by testing the moisture content of each “heterogeneous” sample with critical value equal to 2.482 (1% probability).

% wheat n°2

One Two “heterogeneous” “heterogeneous” sample in the set samples in the set

% wheat n°2

One Two “heterogeneous” “heterogeneous” samples in the samples in the set set

2

1.584



2

1.377



3

1.580



3

1.049



5

1.578



5

1.849



10

1.564



10

2.667

1.833

20

1.622



20

2.814

1.887

30

2.385

2.230

30

2.833

1.893

40

2.530

2.368

40

2.838

1.895

50

2.733

2.559

50

2.841

1.896

60

2.648

2.479

60

2.843

1.896

80

2.721

2.548

80

2.844

1.897

100

2.778

2.602

100

2.845

1.897

116

Detection of Heterogeneous Wheat Samples

Figure 5. Protein content. Superior Grubbs’ values versus number of “heterogeneous” samples (40%, 50%, 60%, 80%, 100%).

Figure 6. Moisture content. Inferior Grubbs’ values versus number of “heterogeneous” samples (10%, 20%, 30%, 40%, 50%, 60%, 80%, 100%).

60%, 80% and 100% of wheat n°2. These percentages may appear to be high, but it should be remembered that the percent variations in protein and moisture contents are less than +10% and –5%, for 80% of wheat n°2. The effect on Grubbs’ values when the number of “heterogeneous” samples varies in the set of 10 samples was studied by adding one, two, three, …and fi-

nally nine “heterogeneous” samples. The obtained Grubbs’ values according to the number of “heterogeneous” samples (N) are shown in Figure 5 for the test on the protein content and in Figure 6 for the test on the moisture content. Only the results obtained for 40% to 100% for the protein and for 10% to 100% for the moisture are shown, corresponding to the detected outliers samples as shown previously when

M.E. Lafargue et al., J. Near Infrared Spectrosc. 11, 109–121 (2003)

117

Figure 7. Standard deviation (SD) spectra calculated on sets of raw spectra (without pre-treatment) with nine “homogeneous spectra” and successively each “heterogeneous spectrum” with 0% of wheat n°2, then 2%, 3%, 5%, 10%, 20%, 30%, 50%, 60% and finally 80% of wheat n°2.

Figure 8. Standard deviation (SD) spectra calculated on sets of pre-treated spectra (SNV pre-treatment) with nine “homogeneous spectra” and successively each “heterogeneous spectrum” with 0% of wheat n°2, then 2%, 3%, 5%, 10%, 20%, 30%, 50%, 60% and finally 80% of wheat n°2.

one “heterogeneous” sample is added in the set of 10 samples. It is noted in the Figures that for N = 1, the Grubbs’ values increase for each level and are over the critical value. Then, they progressively decrease as the number of “heterogeneous” samples increases. For the protein content, with N = 3, no “heterogeneous” samples are detected whereas it happens as soon as N = 2 for the moisture content. It appears then that the Grubbs’ test on quantitative results is able to detect one “heterogeneous” sample but not more than one.

nine “homogeneous” samples increases the variability among samples. In Figure 8, the increase of variability is not as clear, but the increases in intensities are located in precise zones, whereas Figure 7 shows a global increase in the spectral intensity. Then, the SD spectra seem to be a good way to detect one outlier sample in a set, but it implies of course a knowledge of the SD spectra calculated from the “homogeneous” samples. Second, to try to identify the zones of variation between spectra of “homogeneous” and “heterogeneous” samples, differences between spectra corresponding to a sample with 0% of wheat n°2 and successively samples with 2%, 3% ….and 80% of wheat n°2 were computed. Figures 9 and 10 present the results of the differences calculated for raw spectra and for SNV-treated spectra. The differences between the spectra are due to both particle size and chemical composition (protein content, moisture content but also hardness). In Figure 9, it can be seen that the intensity of the difference spectra increases with the percentage of wheat n°2 present in the samples. Figure 10 is more interesting because it emphasises zones of variation of intensity in the spectra, which correspond to the absorption of protein molecules, identified at: 4048, 4367, 4567,

Global approach Basic treatments First, the SD spectra were calculated between the 10 spectra corresponding to the 10 “homogeneous” samples, then on nine “homogeneous” samples and one “heterogeneous” sample with 2% of the wheat n°2, then 3% … and finally 80%. The calculations were done on raw spectra and on spectra pre-treated by SNV. Figures 7 and 8 show the different SD spectra obtained on raw spectra and pre-treated by SNV spectra. Figure 7 shows that the more the percentage of wheat n°2 increases, the more the intensity of the SD spectrum increases, which means that the introduction of one “heterogeneous” sample in the set of

118

Detection of Heterogeneous Wheat Samples

Figure 9. Difference raw spectra (without pre-treatment) between “homogeneous spectra” (containing 0% of wheat n°2) and successively each “heterogeneous spectrum” with 2%, then 3%, 5%, 10%, 20%, 30%, 50%, 60% and finally 80% of wheat n°2.

Figure 10. Difference SNV spectra (SNV pre-treatment) between “homogeneous spectra” (containing 0% of wheat n°2) and successively each “heterogeneous spectrum” with 2%, then 3%, 5%, 10%, 20%, 30%, 50%, 60% and finally 80% of wheat n°2.

4590, 4675, 4764 and 5045 cm–1 (or 2470, 2290, 2190, 2180, 2139, 2100 and 1982 in nm). These difference spectra show that the differences between the samples are linked to the protein content reference, confirming the origin of the heterogeneity among the samples.

the “heterogeneous” sample. Figure 12 presents the results obtained from SNV-treated spectra: (a) for the test on 10 “homogeneous” samples, (b) for nine “homogeneous” samples and one 2% “heterogeneous” sample. It is interesting to note that compared to previously, the variations of the Grubbs’ values are more important along the spectral range and that the passing of the critical Grubbs’ value is sporadic. At around 5000 cm–1, the Inf Grubbs’ value is superior to the critical value: the “heterogeneous” sample is detected, based on the chemical composition. The Grubbs’ test directly performed on the spectra is able to detect one “heterogeneous” sample, even when the difference between samples is very small.

Grubbs’ test Grubbs’ test was performed on the NIR spectra testing the inferior and the superior absorbancies at each wavenumber in the set of 10 “homogeneous” samples, then in the set of nine “homogeneous” samples and one “heterogeneous” sample (2%, 3%, …, 100%). The critical value remains 2.482 and the Grubbs’ value when the inferior absorbancy is tested is called Inf Grubbs’ values whereas when the superior absorbancy is tested, it is called Sup Grubbs’ values. The tests were done on both raw spectra and on SNV-treated spectra. Figure 11 shows the results for the raw data with in (a) the calculated Grubbs’ values when all the samples are “homogeneous” and in (b) the calculated Grubbs’ values when a 2% “heterogeneous” sample is present in the set of samples. The Inf Grubbs’ values permit the outlier sample at each absorbancy to be detected. The same phenomena are observed for the higher level of wheat n°2 in

Distances The maximal Euclidian distances (Euclidian Dmax) were calculated on sets of 10 spectra corresponding to nine “homogeneous” samples and one “heterogeneous” sample, successively 2%, 3%, 5%...and 100 % of wheat n°2. The Euclidian Dmax versus the percentage of wheat n°2 in the samples calculated for the raw spectra and for the spectra pretreated by SNV are plotted in Figures 13 and 14.

M.E. Lafargue et al., J. Near Infrared Spectrosc. 11, 109–121 (2003)

119

Figure 11. (a) Results of Grubbs’ test performed on 10 “homogeneous raw spectra” (containing 0% of wheat n°2). (b) Results of Grubbs’ test performed on nine “homogeneous raw spectra” and one “heterogeneous raw spectrum” (containing 2% of wheat n°2).

Figure 12. (a) Results of Grubbs’ test performed on 10 “homogeneous SNV pre-treated spectra” (containing 0% of wheat n°2). (b) Results of Grubbs’ test performed on nine “homogeneous SNV pre-treated spectra” and one “heterogeneous SNV pre-treated spectrum” (containing 2% of wheat n°2).

An increase in the Euclidian Dmax is observed as a function of the percentage of wheat n°2: (i) for the raw spectra, as soon as 2% of wheat n°2 is present in the sample, which corresponds to an increase of 0.03% for the protein content and a decrease of 0.02% for the moisture content; (ii) for the SNV spectra, as soon as 10% of wheat n°2 is present in the sample, which corresponds to an increase of 0.13% for the protein content and a decrease of 0.08% for the moisture content. As shown in Figures 13 and 14, a linear relation can be established between the Euclidian Dmax and the percentage of wheat n°2 present in the “heterogeneous” sample included in the set of “homogeneous”

samples. The fits obtained with or without pretreatment of the spectra are correct (coefficient of determination R2 = 0.96). To study the effect of adding more than one sample to the set of samples, Euclidian Dmax were calculated with one, two, three …and finally nine “heterogeneous” samples in the set of samples. The operation was repeated for each level of percentage of wheat n°2 in the “heterogeneous” samples. The results are shown in Figure 15. It is interesting to note that whatever the quantity of wheat n°2, but especially for the higher percentage, there is first a decrease in the Euclidian Dmax until five “heterogeneous” samples have been introduced into the set of 10 samples and then an increase. This

120

Detection of Heterogeneous Wheat Samples

Figure 13. Euclidian Dmax calculated on raw spectra (without pre-treatment) versus percentage of wheat n°2 in the “heterogeneous” samples.

Figure 15. Euclidian Dmax versus number of “heterogeneous” spectra added in the set of 10 spectra.

Figure 14. Euclidian Dmax calculated on SNV spectra (SNV pre-treatment) versus percentage of wheat n°2 in the “heterogeneous” samples.

phenomenon is to be expected because when there are more than half of the samples that are “heterogeneous”, the “heterogeneous” samples then tend towards “homogeneous”. Euclidian distances seem suitable to detect “heterogeneous” samples: (i) it is easy to calculate; (ii) a linear relation is observed between the calculated Dmax and the degree of heterogeneity (percentage of wheat n°2); (iii) more than one “heterogeneous” sample can be detected.

Conclusion Checking homogeneity among samples implies the choice of an homogeneity target (a specific

analyte or property thought to be indicative of uniformity of distribution), this choice always presents a risk because one sample can be homogeneous for one target but not for another. That is why the comparison of the NIR spectra helps to avoid this choice to a certain extent and facilitates having a global idea of the homogeneity among samples. The study presented here confirms the performance of the NIR technique developed at Bipea to check the homogeneity of the samples prepared for the PTS, to detect “heterogeneous” wheat samples. Actually, deviation among samples down to 0.03% of protein content and 0.08% of moisture content could be detected using the qualitative approach. It should be noted that such small variations as those detected by the technique have no influence on the acceptability of the PTS results. The comparison of the performances of the quantitative and the qualitative approaches for the detection of the outlier samples shows that the qualitative approach is more sensitive and efficient. The different methods used in the study, that is Grubbs’ test, SD spectra and Euclidian distances, permit the detection of one or more outliers whereas the calculations of difference spectra is a way to determine the origin of the heterogeneity. Since the analysis of the raw spectra can give different results from the pre-

M.E. Lafargue et al., J. Near Infrared Spectrosc. 11, 109–121 (2003)

treated data, both raw and pre-treated spectra should be analysed. All the results of the study confirm that the NIR technique is adequate and outstanding for homogeneity checking. Furthermore, it is a rapid, simple, inexpensive technique and very versatile (analysis of solids, powders, pasty and liquids). It can be applied to the control of samples prepared for PTS from the cereals field but also animal feeds, fats and oils, beverages etc. Nevertheless, although the NIR control method is able to detect a sample which is “heterogeneous” for physico-chemical parameters, such as protein and moisture contents, it may not necessarily be able to do so for trace parameters, such as metals or ascorbic acid. Therefore, if one wishes to test for heterogeneity among samples for trace parameters using the NIR technique, it must first be demonstrated that homogeneity for physico-chemical parameters of constitution detected by NIR also implies homogeneity for the traces.

6. 7.

8. 9. 10. 11. 12. 13. 14.

15.

References 1.

2.

3.

4.

5.

ISO Guide 43-1, Proficiency Testing by Interlaboratory Comparisons – Part 1: Development and Operation of Proficiency Testing Schemes. International Standardisation Organisation, Geneva, Switzerland (1997). ISO Standard norm project 13528, Statistical Methods for Use in Proficiency Testing by Interlaboratory Comparisons. International Standardisation Organisation, Geneva, Switzerland (2001). M.E. Lafargue, M.H. Feinberg, J.J. Daudin and D.N. Rutledge, Anal. Bioanal. Chem. in press (2002). O. Berntsson, L.G. Danielsson, M.O. Johansson and S. Folestad, Anal. Chem. Acta 419, 45 (2000). P. Geladi, D. McDougall and H. Martens, Appl. Spectrosc. 39, 491 (1985).

16. 17.

18.

121

R.J. Barnes, M.S. Dhanoa and S.J. Lister, Appl. Spectrosc. 43, 772 (1989). A. Garrido-Varo, R. Carrete and V. FernandezCabanas, J. Near Infrared Spectrosc. 6, 89 (1998). A. Savitsky and M.J.E. Golay, Anal. Chem. 36(8), 1627 (1964). D. Bertrand, P. Robert and W. Loisel, J. Sci. Food Agric. 36, 1120 (1985). D. Bertrand and P. Robert, Industries des Céréales 33, 7 (1985). M.F. Devaux, D. Bertrand and G. Martin, Cereal Chem. 63(2), 151 (1986). M.F. Devaux, D. Bertrand, P. Robert and M. Qannari, Appl. Spectrosc. 42(6), 1015 (1988). H. Martens and T. Næs, Multivariate Calibration. John Wiley & Sons, Chichester (1983). T. Næs, T. Isaksson, T. Fearn and T. Davies, A User-Friendly Guide to Multivariate Calibration and Classification. NIR Publications, Chichester (2002). O. Berntsson, L.G. Danielsson, M.O. Johansson, S. Folestad, Anal. Chimica. 419, 45 (2000). F.E. Grubbs and G. Beck, Technometrics 14, 847 (1972). ISO 5725-2, Application of Statistics – Accuracy (Trueness and Precision) of Measurement Methods and Results – Part 2: Basic Method for the Determination of Repeatability and Reproducibility of a Standard Measurement Method. International Standardisation Organisation, Geneva, Switzerland (1994). H.L. Mark and D. Tunnel, Anal. Chem. 57, 1449 (1985). Received: 29 October 2002 Accepted: 10 January 2003 Web Publication: 28 April 2003