Examples for the Application of Statistical Data

0 downloads 0 Views 208KB Size Report
V (a M K-dimensional matrix) is the matrix containing the M-dimensional spec- tra vi (i = 1;::: .... The genetic algorithm is implemented in a software package called ASSDeTo (A ..... C. B. Lucasius, M. L. M. Beckers, and G. Kateman, Anal. Chim.
Examples for the Application of Statistical Data Analysis in Microsensor Technology M. Marth1 , M. Rapp2 , T. Wessa3, J. Honerkamp1 1 Freiburger Materialforschungszentrum FMF, Universitat Freiburg 2 Forschungszentrum Karlsruhe, Institut fur Instrumentelle Analytik 3 Schering AG, Berlin

Abstract. Microsensor technology is the most advanced area in the eld

of microsystem technology. However, due to the typically low signal-to-noise ratios of these microsensors it is necessary to employ statistical methods for the data analysis of these systems. Taking a surface acoustic wave sensor array as an example two applications of statistical data analysis in this eld are presented. First, it is shown how a sensor array can be optimized for certain measurement tasks. Second, a new method for automatic disturbance detection is presented.

1 Introduction Microsensor technology is the most advanced eld of microsystem technology. Some sensor systems have been adopted for daily use for a long time, e. g. acceleration sensors in airbags. Within this eld so-called sensor arrays have stirred a lot of interest. Sensor arrays consist of a number of sensors di ering only slightly from each other. The discrimination can be easily achieved as a temperature gradient across the sensors or di erent coatings for apart from that identical sensors are sucient (see below for an example). In such a sensor array a single sensor is comparable to a wavelength in spectroscopy. However, contrary to a wavelength the sensors do not need to be adjusted, but only need to be di erent from each other. This property tremendously simpli es the production and therefore lowers the costs. Another advantage is that sensor arrays can be easily adapted to new measurement tasks. These properties make the use of sensor arrays attractive in elds like environmental sensing and process control. The drawback of these systems is that they rarely achieve the high signal-to-noise ratios or the selectivity of expensive laboratory systems, that are optimized for certain applications. In this paper we will present two examples for the application of statistical data analysis to overcome these disadvantages. The methods will be shown in conjunction with a surface acoustic wave sensor array, but are of course applicable to other sensor arrays as well.

2 Experimental A micro sensor array based on surface acoustic wave devices (SAW) has been developed at the Institute for Instrumental Analysis of the Research Center in Karlsruhe. SAW devices are used for a variety of applications in the electronics of signal processing and therefore have advanced to an item of industrial mass production of microchips. In a modi ed form they can be used as highly sensitive detectors for gases or organic components in water. The devices are coated with a selective sorptive layer and serve as the frequency determining element of an oscillator circuit [1]. Then, the mass change of the coating

arising from the sorption of an analyte leads to a proportional shift of the oscillation frequency. This shift is used as the sensor signal. In order to be used for measurements of the concentrations of several gases a number of SAW devices are coated with di erent polymers which have a certain degree of selectivity towards di erent organic gas components. To perform a quantitative analysis of organic gases and their mixtures it is necessary to use an array of several sensors with di erent coatings generating di erent sensor signals towards a speci c gaseous component and thus obtaining a typical spectrum towards a certain analyte. The sensor system consists of an array of nine SAW resonators working at an operating frequency of 433:92 MHz. Eight sensor oscillators, one common reference oscillator for temperature compensation, circuits for frequency mixing and a gas distribution system are mounted together in a compact monolithic housing (see g. 1). The system named 'SAGAS' (SAW Aroma und Gas Analyse System) is commercially available [2, 3].

Fig. 1. Principle of chemical sensing with surface acoustic waves: the two SAW devices each serve as the frequency determining element of an oscillator circuit. One is coated and represents the real sensor, the other one is left uncoated and serves as a reference for temperature compensation. The di erence frequency is generated by mixing both oscillator signals. The sorption of analytes causes a shift of the di erence frequency which is easily counted by a low frequency counter.

3 Model The sensor system outlined in the previous chapter is an example of a system that can be described with the linear additive model. The relation between the K -dimensional vector of analyte concentrations x and the M -dimensional vector of sensor signals y is given by the linear relation y = Vx + e: (1) V (a M  K -dimensional matrix) is the matrix containing the M -dimensional spectra vi (i = 1; : : : ; K ) of the K di erent analytes. e are the measurement errors that

are assumed to be normally distributed, i.e. ei  N(0; ). V is usually estimated from calibration measurements.

4 Optimization of Sensor Arrays The simple structure of sensor arrays only requires the sensors to be slightly di erent in some way. Therefore, it is often possible to produce far more di erent sensors than can be built in the array. Consequently, the problem arises which sensors to choose for a certain measurement task in order to obtain the optimal array for that task.

4.1 Theoretical The problem of selecting sensors to obtain the best array for a certain application can easily become too complex to be tackled with experimental intuition. Say, there are M0 sensors available out of which M are to be chosen for an array. This means   M that there exist M possible combinations. The problem is NP-complete, i.e. there is probably no ecient algorithm that certainly nds the exact optimum [4]. Thus, the problem is two-fold. A criterion to compare di erent combinations of sensors and an algorithm to nd at least an approximate solution for the best sensor array with respect to this criterion are both needed. 0

Criterion The purpose of any sensor system is, of course, to enable an accurate estimation of analyte concentrations x from sensor signals y. Lorber has shown how the variance of such an estimation can be minimized [5], which results in a sensor system that serves its purpose best. He suggested to evaluate the spectra V of the

sensor 2

sensor signal

substances to be measured in order to obtain so-called gures of merit. The idea behind Lorber's gures of merit [6] is to maximize the part of a spectrum that is orthogonal to the other remaining spectra for all spectra vi . This will minimize the variance of later predictions. Let vi be the spectrum of the i-th substance (i = 1; : : : ; K ) and Vni be the (M  K ? 1)-sized matrix of the remaining spectra. The part of vi that is orthogonal to Vni can be written as: vi? = (1 ? Vni Vn+i )vi ; (2)

v2

v1 v1

v1 v2

1

2

sensor #

sensor 1

Fig. 2. Example illustration of orthogonal spectra for 2 sensors (M = 2) and 2 substances (K = 2). Left: 2 spectra in the usual plot sensor signal vs. sensor number. Right: the two spectra v1 und v2 and the orthogonal spectrum v1? of v1 that is orthogonal to all remaining spectra (in this simple case only v2 ) in sensor signal space.

where Vn+i is the Moore-Penrose inverse of Vni . Figure 2 illustrates the concept of orthogonal spectra for the simple case M = K = 2. Using this de nition there are di erent criteria known in the literature that characterize how well one analyte i can be detected with a certain array [7, 8]:

?

net analyte signal (NAS): vi =

v uM uX ? t

qP M ?

j=1

vi? j

? 2 j=1 vi j qP M (v ) 2 j=1 i j v uM ? uX signal-to-noise ratio (STN): vi := = t j=1 ? v i selectivity: jv j = i

s



v

? :=

relative selectivity: jvi :=j = i

PM



j=1

r

(4) ?

vi?j !

2

j

(v? ) i

j

(5)

2 j

PM  (vi )j 2

j=1

(3)

2

:

(6)

j

(:= is to be understood as a component-wise division). Let i be one of these criteria for one analyte i (i = 1; : : : ; K ) then the reduced mean is needed to calculate the quality of a complete array  by summing over all analytes [9]:  ?1 = K ?1

K X i=1

i?1 :

(7)

The array to be chosen is the one that maximizes  as this is the one for which the variance of later predictions will be minimal and that therefore serves its purpose best.

Optimization Algorithm It can be very time consuming if not impossible to

nd the combination of sensors that maximizes the chosen criterion  . Therefore a genetic algorithm [10, 11] was devolped that gives an approximate solution, but is much faster than checking all possible combinations. Genetic algorithms are optimization techniques that mimic biological evolution. An array is represented as an individual with a certain set of genes (the so-called genome) corresponding to the combination of sensors the array consists of. A population (i.e. a number of di erent arrays) competes with each other to pass on their genes to the next generation. The biological term tness corresponds to the criteria to evaluate an array described above in detail. As in evolution theory three concepts are implemented: recombination, survival of the ttest (the best individuals are allowed to mate and pass on their genes) and mutation (random changes in the genome). After a number of repeated applications of these principles only the best genomes, i.e. arrays have survived [9, 12]. The genetic algorithm is implemented in a software package called ASSDeTo (A Sensor System Development Tool) [13]. It was developed with the experimenter in mind who wants to be shielded from theoretical considerations. It therefore uses a graphical user interface and shows non-vital parameters only on demand (see gure 3). Search algorithm and a suitable criterion are chosen automatically. ASSDeTo is written in the Java programming language and therefore runs on Windows 95/NT, Macintosh, Unix, and other operating systems. The software is commercially available from the authors.

Fig. 3. Screenshot of ASSDeTo running under the Windows operating system.

4.2 Application One intended application of the SAW sensor array is the quanti cation of binary mixtures, e.g. water and MeOH. It was necessary to choose 8 sensors from 16 available ones to optimize the array for this application. The chosen sensor arrays were compared with an independent criterion describing the quality of a sensor array. For this test the data had to be split in a calibration set and a test set. In order to avoid over tting the data were split such that no vector x was in both the calibration and the test set. On the calibration set a PLS [14] calibration was performed and used for prediction of the test set. The independent quality criterion was R2 known from regression analysis which is de ned as P PN1 predicted 2 true N1?1 K ) i =1 j=1 (xi;j ? xi;j R =1? : PK PN1 true 2 true ? 1 (N1 ? 1) i=1 j=1 (xi;j ? xi;j ) 2

(8)

The combination chosen by ASSDeTo were the sensors f1, 2, 6, 7, 10, 12, 15, 16g. A combination independently chosen with experimenter's intuition was f2, 5, 8, 9, 11, 12, 13, 15g, i.e. quite di erent as only the three sensors 2, 12, and 15 were included in both combinations. We have calculated R2 as described above for these combinations and for a random choice (the rst 8 sensors). Results are: combination random intuition ASSDeTo

R2

0.766 0.950 0.965

It can be observed from the table that experimenter's intuition can lead to a very good choice. However, that choice can still be improved by the application of a systematic search. It is expected that this di erence is even larger when there are more sensors to choose from.

5 Disturbance Detection The inexpensive production of micro sensor arrays makes it possible to produce them in large quantities. One eld were they nd wide-spread use is environmental sensing. However, in this case it is necessary that the systems are able to operate at least partly autonomously. Hence, they need to detect disturbances independently. There are a number of possible causes for disturbances, for example { the presence of a substance that was not calibrated might be suspected, { there might be an environmental change like an increase of water, which could render calibration measurements worthless, but might not be noticed at rst, { or a vital sensor in the array might be broken. In this article these cases will not be discussed separately since they are mathematically indistinguishable. They will be referred to as the occurrence of a disturbance. In the following we will present a new method [15] for the automatic detection of these disturbances.

5.1 Theoretical Let Y be a M  N -sized calibration matrix that can be trusted to contain exactly K factors and y be a measurement to be tested if there was a disturbance. If there was a disturbance when y was measured this vector is not in the space spanned by the spectra vi (i = 1; : : : ; K ). The diculty in checking that results from the measurement noise. A certain distance of y from the subspace spanned by the vi has to be tolerated if it can be explained by the measurement noise of the system. To solve this problem we consider the new matrix Y~ that is constructed from Y with y appended as a new measurement: Y~ := (Y ; y ). Let us examine the pseudorank Rk (the rank in the absence of noise) of this matrix. If the hypotheses H : 'y lies within the space spanned by the vectors in V' is true there are only K factors in Y~ , i.e. there was no disturbance when y was measured. The cal

test

test

test

cal

test

0

cal

test

p

test

test

alternative shall be denoted HA . We have 

H0 true K + 1 HA true.

Rkp (Y~ ) = K

(9)

Therefore, the problem is shifted to nding the pseudorank of Y~ with the additional knowledge that only the two values K or K + 1 are possible. Y~ includes two parts: a deterministic one (the term Vx on the right-hand side of eq. 1) and a stochastic one, the measurement noise. The deterministic part can be estimated and removed. A well-known method for this is e.g. principal component analysis [16]. Herewith an estimate of the subspace spanned by the vi is obtained. The di erence between Y~ and a projection of Y~ onto this subspace serves as an estimator of the measurement noise. Let Y~ 0 be the matrix of this di erence and consider the M  M -dimensional matrix Z0 := Y~ 0 Y~ 00 . This matrix has M eigenvalues that shall be ordered and denoted from `1Z (smallest one) to `M Z (largest one). If H0 is true these eigenvalues are purely stochastic. If not the largest eigenvalue lZM must be signi cally larger 0

0

0

than a purely stochastic distribution of the entries in Y~ 0 would suggest as there would be an additional deterministic part in Z0 . It is therefore needed to nd an upper limit for lZM with respect to a certain con dence level up to which H0 can be accepted. To obtain this limit the following result from the theory a random matrices is useful. Let be a    -dimensional matrix whose entries are independently distributed with N (0; 02 ). Then,  := 0 is Wishart-distributed with parameters  and 02 [17]. Sugiyama [18] has shown that the largest eigenvalue ` of this matrix is distributed as ?    ? 12 ( + 1)  x   x  ?  P (`  x) = exp ? 22 ? 12 ( +  + 1) 22 0   1 x 1 (10) 1 F1 2 ( + 1); 2 ( +  + 1);  22 I ; 0

1 2

where ?y (x) is the multivariate gamma function y ? Y





1 ?y (x) :=  ? x ? (i ? 1) ; 2 i=1 I is the identity matrix of size  and 1F1 is die hypergeometric function with matrix argument [19]. In order to be able to use this result to nd an upper limit for lZM with respect to a certain con dence level matrices with the same number of degrees of freedom have to be compared. Therefore, one needs to set  = M ? K and  = N + 1 ? K . Moreover, 2 needs to be estimated as well. Faber has given an estimator for 2 as [20]: PM ?K i i=1 `Z ^ 2  = (11) (M ? K )(N + 1 ? K ) ; where `iZ are the eigenvalues of Z := Y~ Y~ 0 . For large M or N the computation of the distribution in eq. 10 can be dicult. Approximations have for example been given by Pillai and Chang [21]. Moreover, for standard values of there are tabulated values available (see e.g. [22]). Using these results it is possible to test for a measurement ytest if there was a disturbance or not. y(y 1) 4

0

5.2 Application In order to test the method the following data set was produced with the SAW micro sensor array described above. The array was calibrated with mixtures of benzene and octane. For prediction mixtures of benzene, octane, and xylene were used, i.e. xylene was treated as an unknown substance that caused a disturbance. There were 44 measurements for the calibration, 62 unspoiled prediction measurements (without xylene) and 120 spoiled measurements (with xylene) available (i.e. N = 44, M = 8, K = 2). All prediction measurements were tested with = 0:95. Results are shown in g. 4. The left plot shows the histogram of `M Z for the 62 unspoiled measurements. M The upper limit that was found for `Z is marked with a thick vertical line at `M Z = 1:65. The normalized area right of the thick line represents the error of rst degree (H0 true, but rejected). The right plot shows the histogram of `M Z for the 120 spoiled measurements and the normalized area left of the thick line represents the error of second degree (H0 false, but accepted). It can be seen that both areas 0

0

0

0

are small and that it can therefore be concluded that the measurements with and without disturbance are well separated. If this data set is used in order to compare the new method with a standard method in chemometrics, Malinowski's F-Test [23, 24], it is obtained that the detection rate for disturbances improves by as much as 30%. The exact results are displayed in tab. 1.

unspoiled measurements

spoiled measurements

8

4

7

4

relative frequency

relative frequency

5

3

3

2.5

2

1.5

2

1

1

0.5

0 1.3

1.35

1.4

1.45

1.5

1.55

1.6

1.65

1.7

1.75

1.8

second degree error

first degree error

3.5

6

0 1.2

1.4

1.6

largest eigenvalue

1.8

2

2.2

2.4

2.6

largest eigenvalue

Fig. 4. Histograms for the unspoiled and the spoiled measurements. The largest tolerable value of lZM was found as 1.65, marked by the thick vertical line. Left: the normalized area 0

right of the thick line represents the error of rst degree. Right: the normalized area left of the thick line represents the error of second degree.

Table 1. Results for the percentage of correctly analyzed data

largest eigenvalue test Malinowski's F-test no unknown substance 95% (1st deg. err.: 5%) 95% (1st deg. err.: 5%) unknown substance 85% (2nd deg. err.: 15%) 55% (2nd deg. err.: 45%)

6 Conclusion We have shown that statistical data analysis is a valuable tool for the analysis of sensor array data. Applications like the optimization of sensor arrays and automatic disturbance detection were presented. An important aspect is the development of software packages like ASSDeTo that allow advanced statistical methods to be used through a simple graphical user interface.

References 1. H. Wohltjen, and R. Dessy, Anal. Chem., 51, 1458 (1979).

2.8

2. M. Rapp, B. Bo, A. Voigt, H. Gemmeke und H. J. Ache, Fres. Anal. Chem., 352, 699 (1995). 3. M. Rapp, J. Reibel, S. Stier, A. Voigt, and J. Bahlo, Proc. IEEE Freq. Contr., 129 (1997). 4. M. Grotschel, L. Lovasz, and A. Schrijver, Geometric algorithms and combinatorical optimization, Springer, Berlin, 1988. 5. A. Lorber, and B. R. Kowalski, J. Chemometrics, 2, 67 (1988). 6. A. Lorber, Anal. Chem., 58, 1167 (1986). 7. K. S. Booksh, and B. R. Kowalski, Anal. Chem., 66, 782A (1994). 8. R. Buser, Optimization Of Chemical Sensor Arrays (in German), diploma thesis, faculty of physics, University of Freiburg, 1998. 9. C. B. Lucasius, M. L. M. Beckers, and G. Kateman, Anal. Chim. Acta, 286, 135 (1994). 10. J. H. Holland, Adaption in Natural and Arti cial Systems, Univ. of Michigan Press, Ann Arbor, MI (1975). 11. D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA (1989). 12. U. Horchner, J. H. Kalivas, Anal. Chim. Acta, 311, 1 (1995). 13. M. Marth, D. Maier, J. Honerkamp, M. Rapp, U. Stahl, and T. Wessa, Sens. & Act. B, submitted. 14. H. Wolds, Encyclopedia of Statistical Sciences, Wiley, New York, 1984. 15. M. Marth, D. Maier, J. Honerkamp, and M. Rapp, J. Chemometrics, 12, 249 (1998). 16. W. F. Massy, J. Am. Statist. Assoc., 60, 234-246 (1965). 17. J. Wishart, Biometrika, 20, 32 (1928). 18. T. Sugiyama, Ann. Math. Statist., 37, 995 (1966). 19. C. S. Herz, Ann. Math., 61, 474 (1955). 20. N. M. Faber, L. M. C. Buydens, and G. Kateman, J. Chemometrics, 7, 495 (1993). 21. K. C. S. Pilai, and T. C. Chang, Ann. Inst. Stat. Math., 6 (Suppl.), 115 (1969). 22. R. C. Hanumura, and W. A. Thompson, Biometrika, 55, 502 (1968). 23. E. R. Malinowski, Factor analysis in chemistry, Wiley, New York (1991). 24. E. R. Malinowski, J. Chemometrics, 3, 49 (1988), erratum: J. Chemometrics 4, 102 (1990).

This article was processed using the LATEX macro package with LLNCS style

Suggest Documents