is explored. It is shown that convolution of derivative ®lter coef®- ... components regression (PCR) or partial least-squares re- gression ... of derivative preprocessing and drift correction in general has, to our ... tion with smoothing procedures.
Derivative Preprocessing and Optimal Corrections for Baseline Drift in Multivariate Calibration CHRISTOPHER D. BROWN, LORENZO VEGA - M ONTOTO, and PETER D. WENTZELL * Trace Analysis Research Centre, Department of Chemistry, Dalhousie University, Halifax, NS, B3H 4J3, Canada
The characteristics of baseline drift are discussed from the per spective of erro r covariance. From this standpoint, the operation of derivative ® lters as preprocessing tools for multivariate calibration is explored. It is shown that convolution of derivative ® lter coef® cients with the error covariance matrices for the data tend to reduce the contributions of correlated error, thereby reducing the presence of drift noise. This theory is corroborated by examination of exper im ental error covariance matrices before and after derivative preprocessing. It is proposed that m aximum likelihood principal com ponents analysis (MLPCA) is an optimal m ethod for countering the deleterious effects of drift noise when the characteristics of that noise are known, since MLPCA uses error covariance inform ation to perform a m aximum likelihood projection of the data. In sim u lation and experim ental studies, the perform ance of MLPCR and derivative- preprocessed PCR are compared to that of PCR with m ultivariate calibration data showing signi® cant levels of drift. MLPCR is found to perform as well as or better than derivative PCR (with the best - suited derivative ® lter characteristics), provided that reasonable estimates of the drift noise characteristics are available. Recom mendations are given for the use of MLPCR with poor estim ates of the error covariance information. Index Headings: Preprocessing; Baseline drift; Derivative ® ltering; Savitzky - Golay; Digital ® lter; Multivariate calibration; Maximum likelihood principal components regression; Principal com ponents regression.
INTRO DUCTION Baseline drift, or drift noise, is inherent in the information generated by many types of analytical instrumen tation. The source and m agnitude of these drift effects are highly dependent on the nature of the experiment and the type of instrument used, as well as the types of transform ations and processing used on the raw data. Instru mental factors such as source intensity instability (¯ ick er), detector response variations, temperature ¯ uctuations, spatial correlations in the detection sensors, and physical variations in the sample can all result in drift noise. Transformations of the data, such as digital smoothing, can also introduce substantial levels of correlated noise. 1 W hile this drift has a variety of origins, it can be broadly characterized as ``colored’ ’ noise, with a low - frequency dominance in the noise power spectrum. The low - fre quency character of drift noise implies that the errors in measurements at different times (or wavelengths) are correlated, a phenomenon which is typically referred to as error covariance. The presence of error covariance indicates that the errors corrupting the observed signals are not behaving like independent random variables, but are in fact related to the errors at other channels. Received 2 Decem ber 1999; accepted 7 March 2000. * Author to whom correspondence should be sent.
Volume 54, Number 7, 2000
Most multivariate regression methods [e.g., principal components regression (PCR) or partial least- squares regression (PLSR)] assume the independence of noise at different spectral channels (i.e., the noise is uncorrelated). Clearly, then, drift noise can be a detriment to these m eth ods when the observed properties of the data represent a signi® cant departure from the m odel’ s assumed properties of the data. In practice, the low - frequency character of the noise can be dif® cult to distinguish from signals of interest, which also tend to be at low frequencies. Mul tivariate models built from data with signi® cant levels of drift often require m ore param eterization [more factors in PCR, and PL SR; m ore variables in multiple linear re gression (MLR)] to satisfactorily model the property of interest than models built from drift- free data. Several researchers have probed the issue of parsimony with re spect to drift- corrupted data 2,3 and have comm ented on the characteristics and disadvantages of these m odels. 4 Clearly, if drift noise has a deleterious effect on the suc cess of the calibration model, it is desirable to m inimize its contribution to the data by spectral preprocessing. Am ong the m ost frequently used drift noise - reduction techniques in m ultivariate calibration is derivative pre processing. In principle, ® rst - derivative spectra should be free of baseline offset effects, since the ® rst derivative of any function will eliminate constant factors. Likewise, second - (and higher) derivative spectra should reduce baseline effects which can be modeled as polynomial functions of the ordinal variable (e.g., variations proportional to the ordinal variable will be eliminated with sec ond derivatives). W hile derivative preprocessing for m ultivariate analysis is m ost certainly a comm on procedure, the role of derivative preprocessing from the calibration perspective has yet to be explored with any degree of rigor. The earliest implementations of spectral differentiation were for the purposes of feature enhancement rather than drift noise reduction in calibration. 5±7 Since these early applications, several researchers have undertaken studies of the effect of differentiation on peak positions and intensities, 8±10 as well as attempted to ascertain the optimal param eters for derivative calculation when resolution en hancement was the goal. 11±13 Others have explored the calibration effects of differentiation in speci® c circum stances on the basis of such considerations as the multivariate sensitivity, selectivity, 3 and other ® gures of m erit;14 however, these investigations have principally been focused on post - calibration observations. The actual role of derivative preprocessing and drift correction in general has, to our knowledge, not been explored from a theo retical perspective.
0003 -7028 / 00 / 5407 - 1055$2.00 / 0 q 2000 Society for Applied Spectroscopy
APPLIED SPECTROSCOPY
1055
F IG . 2. (a) A simulated spectrum corrupted by drift noise and the resulting derivative spectrum from applying (b) difference, (c) 3 - point linear ® rst - derivative, (d ) 13 - point linear ® rst - derivative, and (e) 13 point quadratic second - derivative ® lters to the original spectrum. F IG . 1. (a) Transfer functions for a variety of ® rst - derivative ® lters, and (b) a comparison of the transfer functions for a 13 - point linear ® rst derivative ® lter and a 13 - point quadratic second - derivative ® lter.
In this work, the properties of derivative ® lters and their effects on chemical signals will be discussed, followed by an examination of the m echanism by which differentiation alleviates drift noise. From this perspec tive, it will become clear that, in addition to other noted drawbacks, derivative ® lters are suboptimal in correcting for baseline drift. On the basis of theoretical considerations of the structure of the measurement errors, an op timal ® lter is derived for the correction of baseline drift in spectral calibration and prediction data. This optimal drift correction ® lter can be straightfor wardly determined from the structure of the noise, and this approach to elim inating correlated measurem ent errors is subsequently shown to be a special case of the recently introduced maximum likelihood PCA (MLPCA).15 MLPCA will be contrasted to derivative preprocessing, and the ef® ciency of maximum likelihood PCR 16 in correcting for baseline drift will be explored. THEORY Derivative Filters. In the chemical literature, the ap proximation of derivative spectra is m ost often accom plished by using ® nite differences or polynomial leastsquares ® lters. The simplest method for obtaining deriv atives of spectra is with simple differences. In this m eth od the rate of change of the signal vector is approximated by ® nding the difference between the signals at adjacent channels in the spectral vector. This simple ® lter, although a reasonably accurate representation of the ob served signal derivative, is typically undesirable due to 1056
Volume 54, Number 7, 2000
its extreme sensitivity to high - frequency noise. The transfer function for a difference ® lter is shown in Fig. 1a, and an example of its application to a spectrum is shown in Fig. 2. Examining the transfer function, we can see that there is substantial attenuation of low - frequency sig nals, and that the difference ® lter actually ampli® es mid and high - frequency regions, which typically lack chem ical signals of interest. As a result of the objectionable response of these ® lters at higher frequencies, differencing methods are of limited utility unless the spectral data exhibit very high signal- to - noise (S/N) ratios [as is often said to be the case with near- infrared (NIR) m easure ments]. An alternative is to use differencing in conjunc tion with sm oothing procedures. These two operations can, however, be achieved simultaneously with polyno mial least- squares ® lters. 17,18 Polynom ial least- squares ® lters, known in the chemical literature as Savitzky±Golay ® lters, yield a least- squares estimate of the derivative over a window of points in the spectrum. The least- squares properties of these functions achieve a degree of low - pass ® ltering, while the deriva tive properties provide some relief from the low - frequen cy drift effects; the result is a form of bandpass ® lter. Transfer functions for a 3 - and 13 - point linear ® rst- deriv ative least- squares ® lter are shown in Fig. 1a as examples. In Fig. 1b, 13 - point linear ® rst - and quadratic second derivative transfer functions are compared, showing the greater low - frequency attenuation achieved by the higher derivative. Figure 2 is also included as a depiction of the characteristics of derivative spectra obtained by the ap plication of these ® lters. In general, increasing the order of the derivative will result in a greater degree of low frequency attenuation since the low - frequency cutoff for higher derivatives is essentially m oved to higher frequen -
cies. Wider ® lters will achieve greater reductions in the higher frequencies due to greater suppression of high frequency components and a shift in the high - frequency cutoff to lower regions. The order of the polynomial function ® tted to the data window is inversely related to the extent of high - frequency attenuation (i.e., higher- order ® ts retain m ore high - frequency com ponents). It can be shown that, provided that the raw data exhibit homoscedastic white noise, the noise level in the ® ltered data is given by s s
2 filtered 2 unfiltered
O
5
c i2
(1)
i
2 2 where s filtered and s unfiltered represent the variance of the noise in the ® ltered and un® ltered signals, and c i is the ith ® lter coef® cient. Under the requisite noise conditions, difference ® lters increase the noise level by a factor of 2. In contrast, due to their bandpass properties, most Sav itzky±Golay derivative ® lters actually reduce the level of noise. This is an important characteristic to note, since it is often said that the differentiation of spectra increases the noise. W hile statements such as these m ay convey the correct connotation (i.e., the classical signal- to - noise ratio is reduced by differentiation), the statement is erroneous as written. The bandpass nature of Savitzky±Golay de rivative ® lters reveals that in most scenarios the noise is reduced; however, the low - frequency attenuation of these ® lters also substantially reduces the slowly varying chem ical signals of interest. These properties often result in a reduced univariate S/N, which is typically de® ned as the ratio of the maximum signal in a spectrum [max(x), where x is a spectral vector] to the standard deviation of the baseline noise ( s noise ), or
S/N 5
max(x) s
(2)
The use of derivative ® ltering as a preprocessing tool for m ultivariate calibration requires consideration of the calibration procedure itself in the proper context. It would be erroneous to use such factors as the univariate S/N ratio in examining the effect of derivative preprocessing, since this univariate m easure is rarely a valid indicator of the predictive success achievable. It is therefore necessar y to consider the effect of derivative preprocessing on multivariate ® gures of m erit,19 ±21 in particular, the sen sitivities, selectivities, and signal - to- noise ratios. The m ultivariate sensitivity (SEN ) is a scalar measure describing the m agnitude of the signal speci® cally attributable to the analyte of interest in the calibration system. It can be de® ned as the length of the net analyte signal vector (NAS ) for the analyte SEN 5 \NAS \ (3) (\x\ represents the Euclidean norm of the enclosed vector.) Two factors contribute to the sensitivity: the magnitude of the values in the spectra themselves and the similarity (or correlation) of the pure - component spectrum for the analyte of interest with all other interfering analyte spec tra. It is apparent that the magnitude of spectral values can be changed simply by changing the scale of the y axis or, in the case of derivative spectra, the x - axis. This property makes the sensitivity in the absence of context an arbitrary ® gure of merit, and thus one of little use in examining the effect of derivative ® ltering. The other principal factor contributing to the SEN for an analyte of interest is the correlation among the pure - com ponent spectra, which is better characterized by the multivariate selectivity. The multivariate selectivity (SEL) contributes to the SEN but is a unit - less measure of the extent to which interfering com ponents obscure the signal of the analyte of interest. More speci® cally, the SEL is de® ned as
noise
It is necessar y to reiterate, however, that Eq. 1 is valid only when the original signal is corrupted exclusively by homoscedastic white noise, which is clearly not the case in spectra exhibiting baseline drift. In addition, using the standard deviation of the baseline as a measure of noise when drift noise is present is particularly misleading, since this metric neglects correlations of the noise that are fundamentally important. There are also serious lim itations to the univariate signal measure [max(x)] when one wishes to make use of the m ultichannel nature of the data for multivariate analysis or calibration. This limitation is discussed to a greater extent below. The Savitzky±Golay implementation of derivative ® l tering (particularly second derivatives) is the m ost pop ular in the literature to date. This popularity can be at tributed to the desirable bandpass properties of these ® lters, as well as their simplicity and ease of use. For these reasons, the Savitzky±Golay method of differentiation was used in the discussions and experiments that follow. Sensitivity and Selectivity Considerations. An often noted concern with the use of Savitzky±Golay ® lters is their rather unpredictable effect on the signal quality. This effect has been previously noted to be problematic when Savitzky±Golay smoothing is done prior to m ultivariate calibration,1 and similar perils exist with Savitzky±Golay derivative ® ltering.
SEL 5
\NAS \ 5 \s\
SEN \s\
(4)
where s is the pure - component spectral vector at unit con centration. Since the length of the true NAS vector can never exceed the length of the true pure - component spec tral vector, the selectivity can var y between 0 (complete spectral overlap, and no observable multivariate signal) and 1 (the signal of the analyte of interest is orthogonal to the signals from all other components, and NAS 5 s). W hile a selectivity of 1 rarely occurs in practice, Eq. 4 suggests that, for the SEL to increase upon differentiation, the correlation of s with the subspace de® ned by the in terfering pure - component spectra m ust decrease. Given that the form that the derivative spectrum takes is speci® c to the frequency, as well as the location of the features contained in the original spectrum, it is dif® cult to know a priori whether the derivative spectra will be m ore or less correlated than the original data. It is often said in the literature that differentiation of spectra enhances the subtle differences in the spectra, and it is assumed that this factor bears direct rewards in cal ibration. W hile this assumption may be true in certain circumstances, differentiation by Savitzky±Golay m eth ods operates by suppressing the low - frequency character of a spectrum (and typically the ver y high - frequency sig nals as well). This process has the effect of not only APPLIED SPECTROSCOPY
1057
suppressing drift noise but also reducing the low - fre quency character of the chemical responses in the spectrum. It is conceivable that, if the attenuated frequencies largely contribute to spectral overlap, the SEL m ay be enhanced by derivative ® ltering. However, if the lower frequencies are attenuated and they are important to the success of the calibration, then derivative ® ltering will obviously be detrimental to the calibration procedure. The previously discussed problem with the multivari ate sensitivity is that, without context, its value is rather arbitrary. The m ultivariate signal - to - noise ratio, however, is the ratio of the m ultivariate signal attributable exclusively to the analyte of interest, to the level of the noise corrupting the calibration data. It has been previously proposed that a suitable m etric for this purpose can be de® ned as S/N 5
\NAS\ 5 N
SEN N
N 5 Ï vTS v (6) where S is the error covariance m atrix for the data, and v is the contravariant vector (the NAS vector norm alized to unit length) 22 for the analyte of interest. When the noise is independent and identically distributed (iid ) (and 2 thus S 5 I ns noise where I n is the n - dimensional identity matrix), Eq. 6 reduces to the typical value, s noise . How ever, if the noise characteristics are non - iid, the value of N directly depends on the orientation of the contravariant vector with respect to the directions of error covariance, and thus the noise level, N, changes from analyte to an alyte within a calibration scheme. From a theoretical perspective, Brown and Wentzell have previously shown that the application of symm etric Savitzky±Golay ® lters to m ultivariate calibration data (which includes even - numbered derivative ® lters) cannot improve the ratio of the sensitivity to noise level when the errors are iid and when the calibration space is well estimated. 1 Therefore, if the errors in the spectra are un correlated (no drift noise), we cannot anticipate multivariate S/N enhancements by even - derivative Savitzky± Golay ® ltering under these conditions. Since these con clusions do not hold in cases in which drift noise is pre sent, it is possible that m ultivariate S/N enhancements are achievable in the presence of correlated error. Derivative Filters and Baseline Drift. As discussed in the Introduction, the low - frequency nature of drift noise is indicative of errors being correlated among channels. The greater the low - frequency com ponents of the drift, the greater the extent of correlation. W hile simple baseline offsets are not typically classi® ed as drift noise, ran dom dc offsets are merely one type of correlated error structure in which all correlations are approximately the same. The treatment of spectral data with derivative ® lters can be conveniently expressed in matrix form . The m ov ing window convolution of the spectral data with the ® l ter coef® cients can be embodied by a ® lter matrix, F, Volume 54, Number 7, 2000
xF 5
xF
(7)
where the dimensions of x F (the ® ltered data vector) and x are 1 3 n, the ® lter matrix F is n 3 n, and the spectral vectors are corrupted by measurem ent errors. With many mixture spectra forming the m rows of a data m atrix, X, the ® ltering operation can be carried out in a similar m an ner. XF 5
XF
(8)
Given that the observed data matrix, X, can be considered the sum of the true data, X 0, and a matrix of m easurement errors, E, Eq. 8 can be expressed as (X 0 1
XF 5
(5)
where N ( 5 s noise ) is the level of white, homoscedastic noise corrupting the calibration space. 19 ±21 Brown and Wentzell’s de® nition for N extends the use of Eq. 5 to the general case of heteroscedastic, correlated errors 1 and is given by
1058
® lled band diagonally with the ® lter coef® cients (see Ref. 1 for more detailed descriptions). This ® lter matrix can be applied to an individual spectral vector, x, as in
E)F 5
X0F 1
EF
(9)
where X represents the noise - free data. Under normal circumstances it would be assumed that the elements of E are normally distributed, and that for a given row of E, E (e T e) 5 s 2 In , where E symbolizes the expectation; that is, the error covariance m atrix is diagonal. With correlated errors, however, we are assured this is not the case, and E (e T e) 5 S (where the error covariance matrix, S , cannot be expressed as a multiple of the identity m atrix; S is symmetric, but not diagonal, and the errors m ay be homo - or heteroscedastic). With the application of a ® lter matrix to the data, and thus to the individual error vectors, we can express the error covariance m atrix after ® ltering (S F ) as 0
S
F
5
E (F T e T eF )
(10)
Since the ® lter m atrices are constant, they can be factored from the expectation expression to yield S
F
5
F T ´E(e T e) ´F S
F
5
FTS F
(11)
Therefore, the ef® ciency of the applied derivative ® lter in eliminating baseline drift can be appraised by exam ining the structure of S F and considering how closely it approximates iid conditions. If the derivative ® lter is completely successful in removing error covariance (and thus drift noise), then S F will be a diagonal m atrix. O ptim al Treatm ent of Baseline Drift. In data that are cured of drift noise by derivative ® ltering, the ® ltered error covariance matrix, S F , will be diagonal. If the ® l tered noise is also desired to be homoscedastic, then S F must equal s 2 In . Dropping the proportionality constant (which can be viewed simply as a scaling factor) and substituting this relation into Eq. 11 leads to In 5 F o F oT 5 S
F oT S F o 2 1
(F oT ) 2 1 F 2o 1 5 S
(12) (13)
where F o indicates that the ® lter is now optimally suited to rem ove baseline drift. Therefore, for derivative ® lters to perform optimally in removing drift noise, the condi tion described in Eq. 13 m ust be m et. Clearly, this con dition could occur only in rare circumstances due to the necessar y con® nes on the structure of Savitzky±Golay derivative ® lter matrices and the fact that derivative preprocessing makes no use of any available error covari ance information. Therefore, a derivative ® lter is almost
always ensured to be suboptimal in rem oving baseline drift in any given situation. Equation 13 suggests that the optimal rem oval of drift noise requires knowledge of the error covariance struc ture of the data. Error covariance inform ation has recently been utilized in m aximum likelihood principal compo nents analysis, which is a generalization of traditional PCA for instances in which non - iid errors are present. 15 Unlike conventional PCA and PL S, MLPCA m akes no assumptions regarding the independence or homoscedasticity of the measurem ent errors, and it yields the m aximum likelihood solution to a principal component sub space regardless of the properties of the m easurement errors. Since derivative preprocessing is principally perform ed to change the properties of the measurem ent errors to conform to the standard assumptions of methods such as PCA and PL S, it is apparent that MLPCA should require no preprocessing for drift noise, since it is en sured to yield the maximum likelihood solution to the principal com ponent subspace regardless of the amount or type of noise that corrupts the spectral data. The most general form of MLPCA makes no implicit assumptions regarding the structure of the m easurement errors in the row or column domain (normal distributions of the errors are, of course, assumed).15,23 Errors m ay be correlated from channel to channel in the spectral do main, or correlated from sample to sample, or both. In addition, the errors are not assumed to be hom oscedastic, and the error structure in one row or column is presumed to be unrelated to the error structure elsewhere. The max imum likelihood solution under these conditions, although realizable, is com plex and numerically cum bersome. A simpli® cation of this ver y general case is to assume that the error covariance predominates in one do main only (as can often be rationalized in ® rst- order cal ibration data), and that error covariance is negligible in the other domain. 23 A further assumption can be used when the error covariance structure does not change sig ni® cantly from spectrum to spectrum, or sample to sam ple. These two often useful assumptions immensely sim plify the maximum likelihood solution to PCA in prac tice. In m any spectroscopic calibration cases these sim pli® cations are valid, since drift noise is often introduced by nonsample speci® c phenomena such as source ¯ icker or ® ber- optic cable ¯ exing; however, the analyst retains the option to use the general case of MLPCA in conditions that are believed to deviate signi® cantly from the above simpli® cations. The maximum likelihood solution with equal row error covariance can also be derived from the ® ltering perspective. Returning to Eq. 9, and recalling that the error covariance matrix, S , is by de® nition symm etric, it is evident that a singular value decomposition of S 2 1 can be expressed as S
2 1
5
US 2 V T 5
USSU T 5
(US)(US) T
(14)
since the left and right singular vectors are identical. Sub stitution for S 2 1 from Eq. 13 into Eq. 14 yields F o F oT 5
(US)(US) T
(15)
and thus, Fo 5
US
(16)
Thus, the optimally designed ® lter m atrix, F o, is easily determined provided that the error covariance m atrix is available. It should be noted that this optimal ® lter matrix will not be of the typical form of a least- squares polynomial ® lter m atrix (band diagonal and symmetric/antisymm etric) and cannot be implemented through a con volution operation with spectra. Although the matrix can not be considered a ® lter in the traditional sense, the term ``® lter m atrix’ ’ will still be used for convenience. With F o determined from Eq. 16, the optimal drift - noise ® lter can be applied to the spectral data in the standard fashion: ZF 5
XF o 5
X (US)
(17)
where Z F indicates that the spectral data have been op erated on by the ® lter m atrix. Conceptually, this ® ltering process can be thought of as eliminating (in the expectation) all the drift noise present in the spectral data by rotating the spectral vectors into directions in which the error is uncorrelated. Once in this orientation, standard PCA for a chosen rank, p, can be employed on Z F , resulting in ZÃF . Z F ¾ ® ZÃF (18) PCA,p
With rank reduction achieved by PCA, the spectral vectors can be rotated back to their approximate original positions by the inverse operation XÃF 5 ZÃF (US) 2 1 5 ZÃF (S 2 1 U T ) (19) where XÃ is the rank p maximum likelihood solution to the PCA of the original data. Additional numerical con cernsÐ namely, the stable inversion of the error covari ance m atrix for Eq. 14Ð can be addressed by obtaining the optimal ® lter matrix from the noninverted error co variance m atrix. The required adaptation is simply that (S
2 1 2 1
)
5
(US 2 U T ) 2
S
1
5
US 2 2 U T
(20)
and thus, the optimal ® lter m atrix, when U and S are calculated from the noninverted error covariance m atrix, is given by Fo 5
US 2
1
(21)
This alteration in no way changes the rotation itself, since the rotation of a subspace ( S ) will produce an identical rotation of a subspace necessarily orthogonal to it ( S 2 1), and it can be easily shown by algebraic m anipulation that (US ) T S
2 1
(US ) 5
(US 2 1) T S (US 2 1)
(22)
Thus, for situations in which there is equal row error covariance, this simple method of perform ing MLPCA avoids the inversion of the error covariance m atrix while achieving the optimal baseline drift correction and rank p - PCA simultaneously. W hile MLPCA can be considered a preprocessing step and PCA combined, it, like PCA, can also be used di rectly in m ultivariate calibration.16 It is therefore pro posed that MLPCA is an optimal drift - noise preprocessing method, and its regression counterpart, MLPCR, is an optimal regression method for use in calibration and prediction systems corrupted by drift noise. These meth ods are optimal in the statistical sense, in that they gen erate the most likely (m aximum likelihood) principal APPLIED SPECTROSCOPY
1059
EXPERIM ENTAL
F IG . 3. A summar y of the MLPCR algorithm when derived as the optimal drift- noise correction ® lter.
component subspaces based on (1) the spectral data at hand and (2) the knowledge the analyst has of the error structure of the data (via replication or otherwise).15 Un like conventional PCA, the projections of the spectra onto the MLPCA subspace do not necessarily occur orthogo nallyÐ the obliqueness of the projection is determined by the magnitudes and directions of error variance - covari ance corrupting the calibration space. Therefore, predic tions from the MLPCR calibration model are also based on both the estimated calibration space and the estimated error covariance matrix. An orthogonal projection can be made in the ``® ltered’ ’ dom ain, however, since in this orientation the errors are effectively iid (within experi mental error). The procedure for the prediction step, then, can be given as Z pred,F 5 XÃpred,F 5
ZÃpred,F 5
X pred F o ZÃpred,F (F o ) 2
Z pred,F V Z V ZT
1
(23)
and thus cÃpred 5
XÃpred,F (XÃF ) 1 c cal
(24)
where the superscript 1 denotes pseudoinversion. The calibration and prediction aspects of MLPCR (derived as the optimal drift- noise ® lter) are summ arized for conve nience in Fig. 3. 1060
Volume 54, Number 7, 2000
Sim ulations. To study the process of drift noise re duction, we carried out simulation studies using three cal ibration methods: PCR, derivative preprocessed spectra with PCR (derivative PCR), and MLPCR. The differen tiation of spectra was achieved by using the m ethod of Savitzky and Golay 17 with a variety of widths and orders for the ® lter function. All the simulated calibration systems in this work involved three spectrally active chem ical components whose pure - component spectra were generated either according to controlled criteria or ran dom ly. Regardless of the type of spectral vectors used, the pure- component spectra were normalized to unit length to standardize the simulated responses. Calibration sets consisted of 20 m ixture spectra in which the con centrations of each of the three components were drawn randomly from a uniform distribution between zero and unit concentration. The prediction sets consisted of 100 mixture spectra with concentrations also drawn from a uniform distribution between 0 and 1. Owing to the fact that the simulated calibration and prediction sets were constructed from rank three systems, the calibration m od el dimensions were preselected to be 3 in all cases. W hile the introduction of correlated measurem ent error can in¯ uence the pseudorank of the data to som e extent, its introduction resulted in no appreciable differences in the obser ved pseudorank of the simulated calibration systems. Controlled Spectral Data. In studies in which it was desirable to ® x the correlation of the pure - component spectra (spectral angle), each pure - component spectrum was generated from a single Gaussian peak ( s peak 5 10 spectral channels) placed in a 200 - channel spectrum so that the correlation between spectrum 2 and all other interfering spectra was 458 . This condition implies that the spectral angles between components 1 and 3 was greater than 458 , and so component 2 was used as the analyte of interest in the bulk of this work since it would be the most dif® cult case. The results were that components 1 and 3 were almost identical to the results for com ponent 2 in all cases, and hence, only the results for com ponent 2 are shown in subsequent ® gures. A set of noise - free calibration mixture spectra generated under these condi tions is shown in Fig. 4a, with the inset showing the pure component spectra. The baseline regions on the ends of the spectra were useful to minimize edge effects from the ® ltering processes. Randomly Generated Spectral Data. In the examinations of the m ultivariate ® gures of merit for differentiated spectra, it was necessary to m inimize the effect of the shape of the spectra on the studies. This was achieved, at the expense of spectral angle control, by generating each pure - com ponent spectrum from four additive Gaussian bands which were centered at randomly chosen lo cations in the spectral domain. If ver y broad Gaussian bands (e.g., s peak 5 25 channels) are used in generating the spectra, then the spectra tend to be comprised of mostly very low - frequency signals and to be broad and generally featureless, and thus highly overlapped. In con trast, spectra generated with narrow Gaussian bands (e.g., s peak 5 2 channels) tend to have higher frequency signals, and thus sharper spectral features. Figure 4b shows a set
bene® t is that these error covariance structures can be easily generated and systematically altered by using sim ple smoothing ® lters. The measurem ent errors for the calibration and predic tion simulations in this work were generated in two steps. First, noise was generated randomly from a normal distribution (m ean of 0, standard deviation of 1). The ridged error covariance matrices discussed above were subse quently generated by applying simple Savitzky±Golay sm oothing ® lters. Generation of the correlated errors in this fashion requires no rotations, but rather a straight fo r w ard sm oo thing operation w ith m ov ing - averag e sm oothing ® lters of varying widths to the original iid measurement errors. Equation 25 conveniently allows the calculation of the error covariance matrix resulting from this ® ltering operation as S
F IG . 4. (a) Twenty noise- free calibration spectra generated according to controlled spectral conditions with Gaussian bandwidths of 10 chan nels ( s pe ak 5 10), and spectral angles of 458 . (b) Twenty noise- free calibration spectra generated acco rding to random spectral conditions with Gaussian bandwidths of 25 channels ( s pe ak 5 25). (c) An example of the controlled calibration data corrupted by drift noise introduced with a m oving average ® lter 51 channels in width.
of noise - free calibration spectra generated from pure component spectra that were randomly generated by this method, using a Gaussian band width ( s peak ) of 25 chan nels. The inset shows the pure - component spectra in this instance. Introduction of Correlated M easurement Errors. Typ ically, the noise in simulation studies is desired to be iid, allowing the m easurement errors to simply be generated from a random number generator such as RANDN in Matlab. However, accurate simulations of baseline drift require generation of correlated measurem ent errors. Pro vided that one preselects the structure of the correlated errors and embodies these characteristics in an error co variance matrix, a simple rotation can be found which correlates errors which were originally independent. 24,25 This rotation can be found in the m anner used above to effectively ``decorrelate’ ’ the m easurement errors in MLPCA. W hile this procedure can be easily accom plished, it requires the user to construct an error covariance matrix. In the simulated experiments performed in the course of this work, a ``ridged’ ’ error covariance m atrix was used, in which the error covariance tends to be greatest between neighboring channels in the spectrum. This approach is representative of what one would expect if certain instrument characteristics were re¯ ected in the error covariance structure, such as cross- talk between spatially proximate channels, and source ¯ icker. A further
5 s
2 noise
F TF
(25)
The amount of correlation was controlled by altering the width of the smoothing ® lter used on the white noise, since wider sm oothing windows result in a larger degree of correlated error. With proper error covariance structure established, the error matrices were scaled to the desired magnitude. In simulation studies in which it was necessar y to regulate the m agnitude of the noise variance (such as comparison studies of PCR, derivative PCR, and MLPCR under different levels of correlated error), the noise variance was kept constant by scaling the errors so that the standard deviation of the errors, regardless of the sm oothing ® lter used, was 0.005. Throughout this work when the level of correlated error is discussed it will be indicated by the width of the moving - average sm oothing ® lter used to generate it. Figure 4c shows a set of calibration mixtures generated from controlled spectral data and corrupted with correlated m easurement errors ( s 5 0.005) with the use of a sm oothing ® lter width of 51 channels. Experimental Data. Experimental data, consisting of diffuse re¯ ectance m easurements on 16 acrylonitrile - bu tadiene - styrene (ABS) form ulated resin samples, were supplied by Dow Chemical Company. Measurem ents were obtained with a Bomem MB155S spectrometer, out® tted with a DiffusIRy attachm ent, designed to allow large - area (13 cm 2 ) re¯ ectance sampling on coarse m aterials. The indium - arsenide detector was therm oelectrically cooled to minimize temperature ¯ uctuation effects. Petri dishes were ® lled to a depth of approximately 1 cm with the ABS samples, and spectra were acquired through the bottom of the containers with the use of the spectrum of a spectralon disk as a reference. Five repeat analyses were perform ed for each sample (each in a different dish) in the region from 10 005±3695 cm 2 1 with the use of 16 cm 2 1 resolution and 128 scans. Initial data exploration found several spectra showing unusually high leverages and concentration residuals. These seven spectral vectors were excluded from build ing subsequent calibration models, leaving a reduced calibration set of 73 spectra. As is expected, calibration perform ance was signi® cantly enhanced by these deletions. W hile wavelength selection procedures may have revealed regions of the spectra that were more useful than others, the intended use of the data was the comparison of various preprocessing m ethods for m ultivariate caliAPPLIED SPECTROSCOPY
1061
F IG . 5. Figures of merit studies on sets of pure- component spectra. (a) Sample pure- component spectra with very broad features (low frequen cy) for extensive studies of the (b) SEL and (c) SEN /N before and after derivative preprocessing with a ® ve- point quadratic second - derivative ® lter. (d ) Sample spectra from the sam e studies conducted on purecomponent spectra with relatively narrow spectral features (higher fre quency composition), and the resulting (e) SEL and (f) SEN /N changes as a result of derivative preprocessing.
bration. Since it is only the relative calibration performances that are of interest, wavelength selection was deemed an unnecessary procedure. Further m odel selection details are discussed below, where relevant. Com putations. All com putations perform ed in the course of this work were carried out on a Sun M icro systems Ultra 60 w ith 4 ±300 MHz CPUs and 1 GB of RAM. All scripts were written in house, and executed in M AT LAB v.5.2 and 5.3 (The Mathworks, Natick, MA) for the Unix platform . RESULTS AND DISCUSSIO N Derivative Preprocessing. Derivative Filters and Fig ures of Merit. In order to explore the effect of derivative ® ltering on chemical signals and multivariate calibration ® gures of merit, 500 sets of three pure- component spec tral vectors were generated with random features as de scribed in the Experimental section. To simulate spectra with relatively broad features, we selected the Gaussian bandwidth to be 75 channels. A typical set of pure - com ponent spectra generated under these conditions is shown in Fig. 5a. Access to the noise - free pure- component spec tra allowed calculation of the true ® gures of merit for each system, so each com ponent’s selectivity was mea sured before and after the spectral data were differenti ated. Figure 5b shows the results obtained from these simulations for the ® rst analyte of the three with the use 1062
Volume 54, Number 7, 2000
of a 5 - point quadratic second - derivative ® lter. (T he re sults were similar for all three components, so only the results for component 1 are shown.) The SEL values (Fig. 5b) all improved to som e degree after derivative treatment com pared to the results obtained for the raw spectral data. W hen the simulation was repeated with spectra ex hibiting higher frequency components (Gaussian band width at 10 channels), the bene® cial effects of derivative ® ltering were much less pronounced. Typical pure - com ponent spectra and the selectivity simulation results are shown in Fig. 5d and 5e, respectively. Multivariate S/N ratios were also calculated in these studies and are sum marized for the two scenarios (broad - and narrow - fea tured spectra) in Fig. 5c and 5f. The observable results of the S/N studies seem to contradict the SEL studies and are highly dependent on the characteristics of the signals. Although it is dif® cult to m ake de® nitive generaliza tions from these simulations alone, several points can be made. The effect of derivative ® ltering on the m ultivariate selectivity of an analyte is dif® cult to predict even in the absence of drift noise. The change in SEL is related to the frequency composition of the pure - component spectra and to the characteristics of the derivative ® lter used. Savitzky±Golay derivative ® lters will attenuate both low - and high - frequency components of the signals, and the change in SEL resulting from these signal m od i® cations is entirely dependent on the location of the bandpass region of the ® lter with respect to the frequen cies in the pure - component spectra that are important for successful calibration and prediction. When the additional complication of correlated measurem ent error is con sidered in the calculation of the multivariate S/N, the re sults are even more dif® cult to predict. The derivative ® lter bandpass, the frequency composition of the signals, and now the relation of the error covariance m atrix to the contravariant vectors for the analytes are all of funda mental importance, making it nearly impossible to state with any certainty a priori whether anything is to be gained by derivative ® ltering from a ® gures of m erit standpoint. As a result, the analyst is resigned to using a trial- and - error approach to carefully match the ® lter band pass to the situation at hand, a noted drawback of deriv ative preprocessing. Derivative Filters and Measurement Errors. It was proposed in the theoretical portion of this work that derivative ® lters can be thought of as attempting to diago nalize the error covariance matrix for the data and render the noise uncorrelated. Figure 6a shows a vector of noise which shows signi® cant levels of correlation. In Fig. 6b, this noise vector has been differentiated by using a 5 point quadratic second - derivative ® lter, and in Fig. 6c, a 15 - point quadratic second - derivative ® lter function was used. As far as inspection can tell, the noise treated with the 5 - point ® lter appears to be much less correlated than the original. Although the treated noise resulting from the 15 - point ® lter looks to be of higher frequency than the raw data, some low - frequency drift is obser ved to persist. Although this example illustrates the visual reduction in correlated errors, a m ore rigorous evaluation m ust come through error covariance matrix comparisons. In Fig. 6d, the error covariance m atrix for the drift corrupted data is shown as a contour plot. The application of the 5 - point quadratic second - derivative ® lter to the raw
F IG . 7. Noise power spectra (NPS) for raw drift noise, raw noise treat ed with a 13 - point quadratic second - derivative ® lter, and raw noise treat ed with a difference ® lter.
F IG . 6. (a) A noise sequence showing signi® cant levels of drift noise. (b) A derivative spectrum of this noise acquired using a ® ve- point quadratic seco nd - derivative ® lter. (c) A derivative spectrum of a acquired using a 15 - point quadratic second - derivative ® lter. (d, e, and f) Error covariance matrices determined experimentally from 50 replicate spec tra of the original and derivative- treated spectra.
data results in the error covariance m atrix shown in Fig. 6e. From examination of the error covariance matrices, it is clear that error variation in this ® ltered data is almost exclusively characterized on the diagonal, implying that very little correlated error remains in the derivative ® ltered signals. The off - diagonal lines that can be obser ved in this error covariance m atrix result from the suboptimal treatment of the obser ved error covariance matrix by de rivative ® ltering. Figure 6f shows the error covariance matrix resulting from data treated with the 15 - point second - derivative ® lter. Substantial error covariance rem ains after derivative treatment in this instance, and it is ap parent that the smaller sized ® lter does a better job of eliminating correlations among the errors and thus reduc ing the contribution of baseline drift. These observations can be rationalized from a theoretical standpoint. Figure 7 shows the noise power spectrum (NPS) for the raw noise and after ® ltering with a difference ® lter and a 13 - point quadratic second - derivative ® l ter. The NPS of the raw noise shows the low - frequency dominance characteristic of drift noise. The 13 - point qua dratic second - derivative ® lter treatment results in a colored NPS, with frequencies in the 0.1±0.2 range domi nating. The difference ® lter, however, leaves essentially white noise. From a drift noise perspective, then, the more narrow derivative ® lters tend to be m ore successful at reducing the low - frequency dominance in the noise
power spectra without introducing other ``colors’ ’ in the noise. M aximum Likelihood PCA and M aximum Likelihood PCR. The application of MLPCA and MLPCR to drift- noise corrupted calibration data was studied under two conditions: using the true error covariance matrix, and using estimates of the error covariance matrix. M LP CR with the True Error Covariance M atrix. Since the correlated error was introduced in the simulated data with known characteristics (see Experimental), it was possible to use this inform ation directly in MLPCA and MLPCR. Figure 8 shows a set of simulated controlled mixture spectra which are heavily impaired by drift noise before and after MLPCA drift correction. For compari son, the PCA reconstruction of the data at the same rank is also shown. It is clear that the level of correlated error has been signi® cantly reduced relative to the uncorrected data, as well as to PCA, and the wildly ¯ uctuating drift noise has been corrected to a remarkable extent. To properly compare the proposed method of drift correction using MLPCA to derivative m ethods, we carried out large simulation studies in which the level of corre lated noise was varied while monitoring the calibration performance of PCR, derivative PCR (with a variety of derivative ® lters), and MLPCR. These calibration sets were generated from controlled spectral data with the standard deviation of the noise ® xed at 0.005 and are consequently similar to the spectra shown in Fig. 4c. In F ig. 9, th e ro ot m ean squ ar e er rors of prediction (RMSEPs) for MLPCR and PCR and a variety of deriv ative PCR methods are shown. Figure 9a displays sample results for derivative PCR with linear ® rst - derivative spectra (varying ® lter widths), while Fig. 9b shows the result of using quadratic second - derivative ® lters. The two ® gures are very similar, except for the behavior of the very narrow ® lters. With ® rst - derivative ® lters, great er smoothing is achieved (although less drift reduction), and so the very narrow ® lters still allow derivative PCR APPLIED SPECTROSCOPY
1063
F IG . 8. (a) Twenty calibration spectra generated under controlled con ditions and corrupted by drift noise using a moving average ® lter width of 95. (b) The data projected onto the rank 3 principal component sub space. (c) The data after rank 3 MLPCA drift correction using the known error covariance structure.
to perform reasonably well. Second - derivative ® lters, however, achieve a much lower degree of sm oothing at narrow ® lter sizes than do ® rst - derivative ® lters, and so derivative PCR calibration models built with narrow sec ond - derivative ® lters have the potential to be heavily im paired by noise, an effect observed in these simulations. W hen the level of correlated error is minimal, MLPCR can be seen to provide no enhancements over conven tional PCR. This is, of course, expected since in the pres ence of uncorrelated m easurement errors MLPCR reduc es to simple PCR. In contrast, derivative PCR often performs considerably worse than PCR when there is little or no error covariance. With no correlated error present, derivative ® ltering can achieve no improvements in calibration perform ance by drift reduction. With the deriv ative ® lter matrix being applied to an effectively iid error covariance m atrix, the derivative ® lter is, in these cases, simply introducing correlated m easurement error into the system and, therefore, creating noise conditions that are not well suited for PCR. These effects, coupled with the possible degradations in selectivity and SEN /N that can result from derivative ® ltering previously noted, could lead to the observed poor performance of these derivative ® lters with low levels of drift noise. As the level of correlation among the errors becom es more signi® cant, both derivative PCR and MLPCR are obser ved to surpass conventional PCR in predictive suc cess. The enhancements obser ved for derivative PCR should result from drift reduction by Eq. 11, with possible contributions in SEL and SEN /N, and the improve ments observed for MLPCR over PCR arise from 1064
Volume 54, Number 7, 2000
F IG . 9. Simulation results comparing the performance of PCR to derivative PCR and MLPCR on controlled spectral data: (a) PCR and MLPCR compared to linear ® rst - derivative ® lters, and (b) PCR and MLPCR compared to quadratic second - derivative ® lters. The derivative ® lter sizes range from 5 channels to 35 channels.
MLPCR’ s use of the error structure in the subspace estimation. In all simulations performed under these con ditions, the perform ance of MLPCR was consistently found to be com parable to or better than derivative PCR at its best (optimal ® lter characteristics). MLPCR with Estimates of the Error Covariance M a trix. Since the true error covariance m atrix for the data is never known in practice, simulation studies were con ducted to compare MLPCR using estimates of the error covariance to derivative PCR. Error covariance estimates can be easily obtained from a set of replicate spectra. If one assumes that the error covariance m atrices for each set of replicates are approx imately the same (an assumption known to be valid in the simulation studies), a pooled estimated error covari ance m atrix can be obtained by averaging all the repli cate- estimated error covariance m atrices. This pooled error covariance matrix, in turn, can be used in MLPCR for drift- noise correction procedures and calibration. To test the utility of this approximation, we generated rep licate calibration spectra for each of the 20 calibration samples. Since the replicate spectra for each sample are used in the estimation of the error covariance structure, the number of replicate spectra that were used per sample was used as the control for the precision of the estimated error covariance matrix. Regardless of the number of
F IG . 10. Perform ance of MLPCR as a function of the number of rep licates of each calibration sample used to estimate the pooled error covariance structure from all samples. The performances of PCR and a few derivative PCR methods are also shown for reference.
sample replicate spectra used to estimate each sample’ s error covariance matrix, all 20 sample error covariance estimates were subsequently averaged to yield a pooled error covariance m atrix estimate. The replicate spectra were used only to estimate the error covariance structure and were not used in the actual construction of the calibration model itself. RMSE Ps for MLPCR under these conditions are shown in Fig. 10. The perform ances of some derivative PCR models are also included for direct comparison. It is apparent that, with this method of pooling the error covariance estimates, MLPCR still perform s extremely well under these conditions. Although the performance of MLPCR appears to be optimal at a certain number of replicates (; 6 or 7), this minimum is m erely a statistical aberration and does not indicate a trend. Pooling the error covariance estimates in this fashion still provides reasonable estimates of the error covariance matrices provided that there are a suf® cient number of sam p les availab le. To exam in e th e per for m ance of MLPCR when ver y poor estimates of error covariance were available, we simulated an experiment in which the analyst estimates the characteristics of the error covariance from the repeated analysis of a single sample. The number of replicate m easurements used in the estimate was used as a control criterion. W hile the effect of the validity of the estimate on the performance of MLPCR is a com plex m atter, som e qualitative discussions can be made. The results shown in Fig. 11 are for low, m edium, and high levels of correlated errors. It is clear that with lower levels of correlated error, a large number of replicates are needed to achieve prediction errors signi® cantly below those of PCR, while signi® cant amounts of correlated error greatly reduced the number of replicates required for MLPCR to perform better than PCR. Presum ably, subtle drift noise is of little detriment to PCR, and additionally, these subtleties are dif® cult to characterize from replicates because of their relatively low value, whereas large correlations are signi® cantly deleterious to PCR and can
F IG . 11. The performance of MLPCR as a function of the number of replicates of a single sample used to estimate the error covariance m atrix (no pooling) and the performance of MLPCR on the sam e data when the error covariance estim ates were ® rst treated with a 25 - point block sm oothing ® lter. Subplots a, b, and c correspond to low, m edium, and high correlations levels (smoothing ® lter widths of 19, 59, and 99).
be reasonably estimated with only a few replicates. It will also be noted that, with low and m oderate correlation, MLPCR perform s comparably to PCR when only a few replicates are available. This outcome is likely to be due to poor estimation of the covariance matrix, which effec tively is equivalent to the iid normal case. The hazard with using poorly estimated error covariance m atrices is that the estimated error covariance m a trix has a very low signal- to - noise ratio. The perform ance of MLPCR with poorly estimated error covariance m atrices can be improved, to some degree, by sm oothing the error covariance estimate with a simple m oving av erage ® lter. Given that the inform ation in the error co variance m atrices is two - dimensional (2D) it is recom mended that a 2D sm oothing ® lter (a block smoother) be used. In 2D sm oothers, the points in a block of size w 3 w (where w is preferably odd) are averaged to obtain a sm oothed value at the m iddle of the block. This method can improve the perform ance of MLPCR if the available estimate of the error covariance m atrix is of ver y poor quality (e.g., few calibration samples and few replicates of each sample spectrum ). In situations where the error APPLIED SPECTROSCOPY
1065
covariance matrix is well - approximated, the sm oothing procedure is unnecessary and m ay in fact hinder the drift correction due to the distorting effects of the block smooth. This error covariance sm oothing technique was applied to the same data used in the above simulations, with the results being included in Fig. 11 for comparison. The block ® lter size was chosen to be 13 3 13, and no attempt was m ade to ® nd the best perform ing block ® lter under these circumstances. It is likely that better results can be achieved if som e effort is m ade in selecting the block ® lter size; however, for the purposes of this work, it was only deemed necessary to show that improvements can be achieved from the operation. In the low correlated error scenario (Fig. 11a), it is apparent that distortion from the block smooth leads to artifacts in the estimated error covariance m atrix which hamper calibration performance relative to MLPCR (without block smoothing), and PCR. With ver y high degrees of correlated error (Fig. 11c), the block sm oothing procedure enhances the performance of MLPCR to some extent, but shows great enhancement in performance. It is likely that in these situations, error covariance structure is well estimated with a minimal number of samples, and thus the block smooth does little to enhance the information contained in the estimate. With a m ediocre amount of correlated error, however, the block smooth leads to a signi® cant improvement in the performance of MLPCR and, in the simulations conducted in the course of this work, gen erally halves the number of replicates required to achieve a given RMSEP. These conditions are best suited for the application of the block sm ooth, since the error covari ance structure is prominent enough to cause serious deg radations in the RMSE P of PCR, but the signal- to- noise ratio of the error covariance matrices estimated from only a few replicates is still quite low. As expected, once a reasonable number of replicates are used to estimate the error covariance structure (; 12 in Fig. 11b), the block smoothing procedure does little to enhance the quality of the estimated error covariance m atrix and, thus, does little to improve the perform ance of MLPCR. Overall, the error covariance smoothing did enhance the perform ance of MLPCR when signi® cant levels of correlated error were present and tended to reduce the num ber of replicates required to achieve accuracy beyond that of derivative PCR methods. Experimental Data. MLPCR and derivative PCR were compared to conventional PCR in handling baseline drift in the experimentally acquired diffuse re¯ ectance spectra previously described. Examinations of the error covariance structure revealed that it was very similar be tween different samples of the calibration set, allowing for the pooling of the replicate estimates. Error covari ance m atrices were calculated for each set of sample rep licates. Subsequently these 15 error covariance estimates were pooled to yield a pooled estimate of the error co variance matrix (shown in Fig. 12), which was used with MLPCR for multicomponent calibration. Because a distinct prediction set was unavailable, cross- validation was used to select m odel param eters and estimate the predictive performances of the competing methods. The root mean - squared error of cross- validation (RMSE CV) was used as an indicator function, and the cross- validation procedure was conducted by leaving out 1066
Volume 54, Number 7, 2000
F IG . 12. Pooled estimated error covariance matrix for the experimen tally obtained NIR diffuse re¯ ectance measurem ents.
one sample at a time (all replicates of a sample). Several model parameters had to be systematically altered in the cross- validation regime: num ber of latent variables (all models), derivative ® lter width (derivative PCR), and de rivative ® lter properties (derivative PCR). Lacking other inform ation, the model param eters corresponding to the absolute minimum value in the RMSE CV were taken to indicate the best perform ing m odel. For derivative PCR, the properties of the derivatives were set (e.g., PCR with quadratic ® rst - derivative preprocessing), and then the best ® lter width and number of latent variables were selected under these conditions by cross- validation. W hile the se lection of optimal calibration characteristics and predic tion error estimation are best done with the use of a validation data set, it is the relative perform ances that are of primar y interest in this work. Lacking enough samples for a validation set, cross- validation error was the only reasonable measure available for this task. Figure 13 depicts the result of a sample derivative pre processing drift correction procedure (11 - point quadratic second derivative), and MLPCA drift correction. The magnitudes of the derivative spectra have been dramatically reduced, and some variability in the calibration spectra has been reduced. The largest variation in the derivative spectra appears in the spikes that rem ain, while the variation in most of the ¯ atter areas of the original spectra has been all but annihilated. The MLPCA corrected spectra, however, exhibit rather different varia tions. Signi® cant variance still exists from sample to sam ple, even in the very ¯ at regions of the MLPCA corrected re¯ ectance spectra. The variation and inform ation that have been preser ved in the MLPCA drift - corrected spec tra were largely removed by derivative preprocessing as a result of the sharp low - frequency attenuation of these ® lters. Figure 14 gives a visual comparison of a single set of replicate m easurements corrected under these con ditions. The results of the cross- validation and calibration pro cedures are summ arized in Table I as RMSECVs and performance ratios (PRs), showing the perform ance of PCR, derivative PCR (under a variety of best - case ® lter conditions), and MLPCR. The perform ance ratio is de® ned as the relative perform ance of MLPCR to the other calibration methods, or
F IG . 13. A comparison of drift correction on the bulk calibration data (73 spectra) by a best - case derivative ® lter (11 - point quadratic second derivative), and MLPCA.
PR 5
RMSECV M LPCR RMSECV other
F IG . 14. Five replicate diffuse re¯ ectance spectra of an ABS polymer sam ple without preprocessing and the resulting replicate spectra after derivative preprocessing (11 - point quadratic second derivative) and MLPCA drift correction.
(27)
The PR will exceed unity for calibration methods that demonstrate perform ance superior to that of MLPCR. For all three analytes of interest in the calibration set, MLPCR outperforms both the best PCR and the deriva tive PCR models in its predictions. In some cases, such as with com ponent 2, som e derivative PCR methods perform comparably with MLPCR. It must be kept in m ind, however, that the expressed results are best- case scenarios for derivative ® ltering and the result of extensive searches for optimal derivative PCR param eters. Also shown in Table I are the RMSECVs for MLPCR using a sm oothed error covariance matrix. In this application, the smoothing operation resulted in little change in the MLPCR cross- validation error. This result is un surprising, however, since the error covariance structure for these calibration data is ver y prom inent and is well estimated by pooling the estimated error covariance ma trices. A visual inspection of the estimated error covariance matrix (Fig. 13) con® rm s that it does appear to have a ver y high signal - to - noise ratio, implying that there is reasonable precision in the estimation of the error co variance structure. CONCLUSION The objective of this work was to investigate derivative preprocessing as a m ethod of drift noise reduction in mul -
tivariate spectral data. This examination was carried out from the perspective that baseline drift can be characterized as correlated m easurement errors, and that derivative ® ltering alleviates some drift noise by reducing the co variance terms in the error covariance matrices (via Eq. 11). While this approach is often successful to some de gree, derivative ® lters cannot be considered optimal, since the error covariance m atrix can rarely be diagonalized by their operation. In addition, the use of derivative ® lters modi® es the composition of the chemical signals in a fashion that is very dif® cult to predict a priori, m ak ing the effects on ® gures of merit in multivariate calibra tion largely unpredictable. Derivative ® lters operate blindly in reducing drift noise and, therefore, must be chosen on a trial- and - error basis, but m aximum likelihood PCA uses error covariance in fo rm ation obtained f rom replicate m easur em ents to achieve the simultaneous drift correction and the maximum likelihood projection of the spectral data into a prin cipal component space. It was shown that MLPCA is the optimal ® lter from a drift reduction perspective, since MLPCA uses error covariance information to diagonalize the error covariance matrix and thus eliminate drift noise. The regression counterpart to MLPCA, MLPCR, is con sequently an optimal calibration m ethod to use when drift noise plagues the acquired data. Baseline drift poses a signi® cant threat to the precision APPLIED SPECTROSCOPY
1067
TABLE I. Calibration performances for PCR, MLPCR, and various form s of derivative PCR. The ® lter condition (O,D ) refers to O, the order of the polynomial function used to determ ine the derivative, and D, the ® rst (1) or second (2) derivative. (RMSECV: root m ean squared error of cross - validation; PR: perform ance ratio of the RMSECV of MLPCR to the RMSECV of the other calibration methods; LV: number of latent variables selected). Original data set Component
Filter condition (O, D)
Width
RMSECV
LV
PR
1
PCR Derivative PCR: 1,1 Derivative PCR: 2,1 Derivative PCR: 2,2 MLPCR MLPCR, ecv smoothing
Ð 3 3 7 Ð 9
1.11 0.82 0.47 0.44 0.29 0.33
4 4 6 6 7 7
0.26 0.35 0.62 0.66 1.00 0.88
2
PCR Derivative PCR: 1,1 Derivative PCR: 2,1 Derivative PCR: 2,2 MLPCR MLPCR, ecv smoothing
Ð 3 5 9 Ð 9
0.95 0.31 0.32 0.30 0.28 0.31
6 3 5 9 6 5
0.29 0.90 0.88 0.93 1.00 0.90
3
PCR Derivative PCR: 1,1 Derivative PCR: 2,1 Derivative PCR: 2,2 MLPCR MLPCR, ecv smoothing
Ð 7 7 9 Ð 9
1.24 0.66 0.60 0.55 0.49 0.45
7 7 7 9 7 7
0.40 0.74 0.82 0.89 1.00 1.09
and accuracy of m any m ultivariate calibration m ethods. Derivative preprocessing has been widely employed to combat this problem in the past, but since it is suboptimal in terms of drift correction, its application requires time consuming searches for the best ® lter characteristics for a given application. Unfortunately, the spectral interpret ability also suffers upon differentiation. In this work, MLPCR was consistently found to perform as well as or better than derivative PCR when reasonable estimates of the error covariance structure were available. It is there fore recom mended that, provided that error covariance inform ation is obtainable, MLPCR be used as a calibra tion m ethod for data corrupted by baseline drift. ACK NOW LEDGM ENTS The authors are grateful for helpful conversations with Randy Pell and Mar y Beth Seasholtz of the Dow Chemical Company, instrumental analyses done on the resin sam ples by Dave Albers, and ® nancial sup port provided by the Natural Science and Engineering Research Council of Canada and the Dow Chemical Company. C.D.B. was supported in part by a grant from the Sumner Foundation. 1. 2. 3. 4.
C. D. Brown and P. D. Wentzell, J. Chemom. 13, 133 (1999). O. E. de Noord, Chemom. Intell. Lab. Syst. 23, 65 (1994). N. M. Faber, Anal. Chem . 71, 557 (1999). M. B. Seasholtz and B. R. Kowalski, Anal. Chim. Acta 277, 165 (1993).
1068
Volume 54, Number 7, 2000
5. V. J. Hamm ond and W. C. Price, J. Opt. Soc. Am. 43, 924 (1953). 6. E. Tannenbauer, P. B. Merkel, and W. H. Hammill, J. Phys. Chem. 21, 311 (1953). 7. J. D. Morrison, J. Chem . Phys. 21, 1767 (1953). 8. T. C. O’ Haver and G. L. Green, Anal. Chem. 48, 312 (1976). 9. J. E. Cahill, Am. Lab. 11, 79 (1979). 10. T. C. O’ Haver and T. Begley, Anal. Chem. 53, 1876 (1981). 11. T. R. Grif® ths, K. King, H. V. St. A. Hubbard, M. - J. Shwing- Weill, and J. Meullemeestre, Anal. Chim. Acta 143, 163 (1982). 12. D. G. Cameron and D. J. Moffatt, Anal. Chem. 41, 539 (1987). 13. W. F. Maddams and W. L. Mead, Spectrochim. Acta, Part A 38, 437 (1982). 14. L. L. Juhl and J. H. Kalivas, Anal. Chim. Acta 207, 125 (1988). 15. P. D. Wentzell, D. T. Andrews, D. C. Hamilton, K. Faber, and B. R. Kowalski, J. Chemom. 11, 339 (1997). 16. P. D. Wentzell, D. T. Andrews, and B. R. Kowalski, Anal. Chem. 69, 2299 (1997). 17. A. Savitzky and M. J. E. Golay, Anal. Chem. 36, 1627 (1964). 18. J. Steiner, Y. Term onia, and J. Deltour, Anal. Chem. 44, 1906 (1972). 19. A. Lorber, Anal. Chem . 58, 1167 (1986). 20. K. Faber, A. Lorber, and B. R. Kowalski, J. Chemom. 11, 419 (1997). 21. K. S. Booksh and B. R. Kowalski, Anal. Chem. 66, 782A (1994). 22. E. Sanchez and B. R. Kowalski, J. Chemom. 2, 247 (1988). 23. P. D. Wentzell and M. T. Lohnes, Chemom. Intell. Lab. Syst. 45, 65 (1999). 24. R. J. Pell and B. R. Kowalski, J. Chemom. 5, 375 (1991). 25. K. Takeuchi, H. Yanai, and B. N. Mukherjee, Foundations of Mul tivariate Analysis: A Uni® ed Approach by Means of Projection onto Linear Subspaces (Wiley Eastern, New Delhi, 1982).