Prediction of Key State Variables using Support Vector Machines in

0 downloads 0 Views 196KB Size Report
Support vector machines (SVM) are applied to the prediction of key state variables in ... rolling learning-prediction procedure is used to deal with the time variant ...
Full Papers Prediction of Key State Variables using Support Vector Machines in Bioprocesses By Yunfeng Li and Jingqi Yuan* DOI: 10.1002/ceat.200500182 Support vector machines (SVM) are applied to the prediction of key state variables in bioprocesses, such as product concentration and biomass concentration, which commonly play an important role in bioprocess monitoring and control. A so-called rolling learning-prediction procedure is used to deal with the time variant property of the process, and to establish the training database for the SVM predictor, which is characterized with the rolling update of the training database. As an example, product concentration in industrial penicillin production is predicted, and a comparison is also made with three different artificial neural network architectures (FBNN, RBFN, and RNN). The test results indicate that a prediction accuracy of 1–3 % can be obtained for 4–40 h ahead prediction using the SVM, which is better than the best of the three artificial neural networks (ANNs). Moreover, for noise-added training highly noisy data or small-sample learning, the SVM also clearly outperforms FBNN, RBFN, and RNN.

1 Introduction In bioprocess industries the key state variables, such as product and biomass concentration, are usually measured offline in the laboratory due to a lack of reliable online sensors and chemical analyzers. These variables, as significant indicators of process behavior, are utilized in online process monitoring and control. The delayed and relatively infrequent information resulting from offline assay has difficulty meeting the requirement of the real-time monitoring and control, and ultimately influences the product quantity and quality. As a result, online variable prediction has received considerable interest. In the past, multivariate static techniques such as principal component regression (PCR), partial least squares (PLS), and multiple linear regression (MLR) have been widely used for online prediction of state variables [1–3]. But multivariate static techniques need a relatively large number of samples, and are commonly sensitive to measurement errors, as well [4]. In recent years, artificial neural networks (ANN), as a powerful nonline ar technique, have been especially focused on since they provide a simple, straightforward approach to predict state variables. ANN can simulate highly nonline ar dynamic relationships of the process without prior knowledge of the model structure. It has been successfully applied to variable prediction in bioprocesses [5–7]. In practical application, however, some disadvantages related to convergence speed, network topology, bad local minima, – [*]

Y. Li, Prof. J. Yuan (author to whom correspondence should be addressed, [email protected]), Department of Automation, Shanghai Jiao Tong University, 1954 Huashan Lu, Shanghai 200030, PR China.

Chem. Eng. Technol. 2006, 29, No. 3

and the “over-fitting” phenomenon have a great influence on prediction accuracy, and limit more extensive application of ANN. The support vector machines (SVM) is a developing technique, although its origin can be traced back to the late 1960s [8, 9]. It is a novel, powerful machine learning method based on statistical learning theory (SLT), which is a smallsample statistical theory introduced by Vapnik [10]. In SVM, the structural risk minimization (SRM) principle is adopted to replace the empirical risk minimization (ERM) principle, while the latter is generally employed in classical methods such as ANN and PLS. SRM seeks to minimize an upper bound of the generalization error rather than minimize the training error [10, 11]. Therefore, SVM is able to achieve an optimum network structure by a compromise, balancing the quality of the approximation of the given data and the complexity of the approximating function. The “over-fitting” phenomenon in general ANN can be avoided, and excellent generalization performance is obtained. Moreover, the support vectors, which correspond to the hidden units in general ANN, are automatically determined after SVM training. This means the difficult task of determining the network structure in general ANN doesn’t exist in SVM. Indeed, SVM has been receiving increasing attention recently for its high accuracy and good generalization ability [12–15]. Compared with theoretical research, however, the reports concerning SVM application are still relatively rare at present. The purpose of this work is to study the online prediction of key state variables in bioprocesses using SVM, and also to make a comparison of the prediction accuracy with linear extrapolation, as well as with three different ANN architectures, including feed-forward back-propagation neural net-

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

313

Full Paper works (FBNN), radial basis function networks (RBFN), and recurrent neural networks (RNN). As a practical example, product concentration in industrial penicillin production is predicted to test the prediction performance of these methods. In addition, a procedure named rolling learning-prediction is employed to deal with the time variant property of the process and to establish a training database for each sampling time [7]. This paper is organized as follows. Section 2 summarizes the standard SVM regression algorithm and rolling learningprediction procedure, and gives detailed implementation steps. In section 3, product concentration prediction in penicillin production is illustrated to validate the SVM approach, and a comparison with three ANNs, as well as linear extrapolation, is made. Finally, section 4 gives the discussions and conclusions.

eter,

l P

Le …xi ; yi ; f † is the error term, i.e., the empirical risk

iˆ1

in learning theory, Le(·) is the so-called e-insensitive loss function which represents the loss or discrepancy between y to a given input x and goal function f(x). In SVM regression, the loss function generally includes the linear e-insensitive loss function, the quadratic e-insensitive loss function, and the Huber loss function [10]. Here, a linear e-insensitive loss function is adopted:  Le …xi ; yi ; f † ˆ

0; yi j

jf …xi †

e;

for jf …xi † otherwise

yi j < e

By introducing two slack variables, ni and ni*, representing errors larger than ± e, respectively, the objective function in Eq. (4) is formulated as: l X 1 Q ˆ kwk2 ‡C …ni ‡ ni † 2 iˆ1

2 Theory

…5†

…6†

2.1 Support Vector Machines Regression subject to: SVM was originally used for classification purposes, but its principles can be extended easily to the task of nonline regression by the introduction of an alternative loss function [16]. The basic idea of SVM regression is to map the original data set1) S into the mapped data set SF in a high dimensional feature space F via a nonlinear mapping function j(·), and then perform a linear regression in this feature space [10]. S ˆ f…x1 ; y1 †; …x2 ; y2 †;   ; …xl ; yl †g; x ∈ Rn ; y ∈ R

(1)

SF ˆ f…j…x1 †; y1 †; …j…x2 †; y2 †;   ; …j…xl †; yl †g j…x† ∈ RN ; y ∈ R

8 < yi wT j…xi † b ≤ e ‡ ni ; wT j…xi † ‡ b yi ≤ e ‡ ni ; : n ; ni ≥ 0 ; i ˆ 1; 2;   l; where e is another user-prescribed parameter standing for the accuracy demanded for the approximation. Constraints of the objective function are set to the errors between regression predictions (wTj(xi) + b) and true values (yi). The minimization of the objective function belongs to a linearly constrained quadratic programming (QP) problem. By introducing a Lagrangian function, we have:

(2) wˆ

where N >> n represents the dimension of feature space F. Defining the linear regression function in this feature space as Eq. (3), nonline ar function regression in the original space becomes a linear function regression in the feature space. f …x† ˆ

wT j…x†

‡b

(3)

where w = (w1, w2,  , wN) is a weight vector which can be determined from the data by minimizing the function: l X 1 Le …xi ; yi ; f † Q ˆ kwk2 ‡C 2 iˆ1 1

…4†

where kwk2 is a weight decay used to regularize 2 weight sizes and which penalizes large weights [12], C is a pre-specified positive value used as regularization param– 1)

314

List of symbols at the end of the paper.

http://www.cet-journal.com

l X

…ai

ai †j…xi †

…7†

iˆ1

The optimal regression function is: f …x† ˆ wT j…x† ‡ b ˆ

l X

…ai

ai †hj…xi †; j…x†i ‡ b

…8†

iˆ1

where ai and ai* are the introduced Lagrange multipliers satisfying the expression ai·ai* = 0, ai ≥ 0, ai* ≥ 0. Based on the Karush-Kuhn-Tucker (KKT) conditions of quadratic programming, only a certain number of coefficients (ai–ai*) will be non-zero values, and the corresponding data pairs are called support vectors in SVM theory. hj(xi), j(x)i is the dot product between j(xi) and j(x). Clearly, only these support vectors determine the position of the regression function according to Eq. (8). As for the parameter b, it can be computed by feeding any one support vector into Eq. (8), but use of the average of b values over all support vectors is suggested here.

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Chem. Eng. Technol. 2006, 29, No. 3

Full Paper Note that it is difficult to compute the nonline mapping j(x), and thus a kernel function is introduced to replace the inner product in the feature space, i.e.: K…xi ; x† ˆ hj…xi †; j…x†i

(9)

The advantage of using the kernel function is that we are able to deal with feature spaces of arbitrary dimensionality without computing the map j(·) explicitly. The regression function can be rewritten as: f …x† ˆ

l X

…ai

ai †K…xi ; x† ‡ b

…10†

employed to frame input-output data pairs in the rolling learning-prediction procedure. The widths of the input and output data windows are defined as TD and TP, respectively. TD determines the time span of the input vector, and TP the length of prediction horizon. Both of them move along the time scale with a fixed moving step TM. For simplicity, assuming that TM equals sampling interval TS, a series of input-output data pairs can be obtained by discretizing the transients of the process variables covered by each data window. Defining the input-output data pair corresponding to the kth data window as {X(Tk), Y(Tk)}, then: X…Tk † ˆ ‰Tk x…Tk †x…Tk

iˆ1

In theory, the kernel function K(xi, x) can be any symmetric function satisfying Mercer’s condition. In practical application, however, a different kernel is selected in terms of each different given system. Typical kernels include: I. Polynomial kernel K…xi ; x† ˆ …xTi x ‡ 1†d ; II. Sigmoid kernel K…xi ; x† ˆ tan h…b 1 xTi x ‡ b 2 †; III. Gaussian radial basis kernel K (xi, x) = exp(–0.5 kx xi k2 =r2 †. Where d, b1, b2, and r are specified in advance by the user. More detailed explanations about SVM algorithms can be found in the tutorial book [10].

2.2 Rolling Learning-Prediction Procedure Yuan and Vanrolleghem [7] proposed the rolling learningprediction procedure to deal with the time variant property of the process and to establish a training database. Taking the product concentration prediction in penicillin production as an example, a brief introduction about the rolling learning-prediction procedure is given here. As shown in Fig. 1, two data windows, input data window (solid frame) and output data window (dotted frame), are

1s†x…Tk

2s†  x…Tk

ms†ŠT (11)

x…Tk † ˆ ‰P…Tk † PAA…Tk † S…Tk † Nit…Tk † O2 …Tk † CO2 …Tk † V…Tk † pO2 …Tk † pH…Tk †   ŠT Y…Tk † ˆ ‰P…Tk ‡ TP †ŠT

(12) (13)

where Tk is the current sampling time at the right border of the input data window so that we have T1 = TD, constant s is the discretization time interval and equals the integer time of TS, m is dating back steps, which equals TD/s, and P (Tk + TP) is the product concentration at sampling time (Tk + TP). These input-output data pairs are used to construct the training database for rolling learning-prediction. The training database contains two parts. One part consists of all inputoutput data pairs from n normal historical charges, h1∼n, the other part is composed of all the input-output data pairs available until the current sampling time for the (n + 1)th charge (i.e., the charge of present interest), h1+n. Suppose the cultivation period of the ith historical charge is Ti,f, the number of total input-output data pairs of this charge will be: Ni ˆ int…

Ti;f

TD TM

TP

†

…14†

Therefore: h1∼n ˆ fXi …Tk †; Yi …Tk †g k ˆ 1 ; 2 ;    ; Ni , i ˆ 1; 2;   ; n

(15)

As for h1+n, assuming the current sampling time is Tk, we have: hn‡1 ˆ fXn‡1 …T1 †; Yn‡1 …T1 †; Xn‡1 …T2 †; Yn‡1 …T2 † ;    ; Xn‡1 …Tk j †; Yn‡1 …Tk j †g j ˆ int…TP =TM †

Figure 1. Schematic diagram of the input and output data window, where variables, P, S, and PAA have been normalized between 0 and 1.

Chem. Eng. Technol. 2006, 29, No. 3

(16) (17)

The output data vectors corresponding to Xn+1(Tk-j+1)– Xn+1(Tk) do not exist since the future measurements are not available, as yet they are just predicted by the predictor. Furthermore, the number of data pairs of the (n + 1)th

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

http://www.cet-journal.com

315

Full Paper charge is also limited even if it runs during the final phase of cultivation. Owing to individual character information contained in the (n + 1)th charge, however, h1+n plays an important role in improving the prediction accuracy. Based on the training database {h1∼n, h1+n} and the standard SVM regression algorithm, the optimal regression function in Eq. (10) can be computed. And then, feeding the latest input data vector as input to the predictor, the corresponding prediction value of future product concentration will be obtained. When the next sample time comes, the training database is updated {h1∼n, h1+n} and the learningprediction process repeated.

2.3 Implementation Step The implementation steps are summarized as follows: – Step 1: Determine parameters TD, TP, TM, s, and m and select process variables, and also determine the kernel and parameters C, e of the SVM predictor. – Step 2: Select n historical charges to construct the set of input-output data pairs, h1∼n. – Step 3: Obtain all of the input-output data pairs available of the (n + 1)th charge (the predicted charge) up to the current sampling time to construct h1+n. – Step 4: Based on the training database {h1∼n, h1+n} and the standard SVM regression algorithm, compute the optimal regression function in Eq. (10). – Step 5: Input the latest input data vector into the SVM predictor to obtain the corresponding prediction value of the product concentration. – Step 6: When the next sampling time comes, go to Step 3, and repeatedly carry out the learning and predicting process until the predicted charge is terminated.

3 Case Study 3.1 Industrial Data and Parameters Selection As a practical example, the product concentration in commercial penicillin production is predicted here. A total of thirty five industrial normal charges are collected from a Chinese pharmaceutical factory with a sampling interval TS = 4 h. For confidentiality reasons, all of the involved data (as well as the data in the following text) have been normalized between 0 and 1. Among those charges, twenty charges are pick out to construct the set of training data pairs h1∼20; five charges are chosen as a cross validation set used to tune parameters of the predictor; another ten chargers are used as testing charges. The product concentration distribution of testing charges is plotted in Fig. 2, where they are numbered from Charge 1 to 10. Dynamically measured variables P, PAA, S, Nit, and V are most closely associated to the product concentration, and thereby they are chosen as input variables involved in the input data window. Parameters TD, TM, s, and m are set as fol316

http://www.cet-journal.com

Figure 2. Product concentration distribution of ten testing charges.

lows: TD = 40 h, TM = TS = 4 h, s = 8 h, and m = 5, respectively. Therefore, every input-output data pair will contain thirty two elements (thirty one inputs and one output). The width of the output window TP determines the length of prediction horizon. To test the prediction ability fully, TP is set from 1- to 10-time sampling intervals, i.e., from 4–40 h. Relevant parameters of the SVM and the three ANNs (FBNN, RBFN and RNN) are determined by cross-validation in order to obtain better prediction performance, as given in Tab. 1. For the SVM, since only kernel function and parameters C, e need determining, this is relatively easy. For three ANNs, however, it will be a laborious task to determine network topology and parameters by cross-validation. In this sense, the SVM is advantageous in not considering network topology. The SVM and three ANNs are trained using the same training database, and tested using the same testing charges as well. In addition, the product concentration will behave quasilinearly during the middle phase of cultivation for a well controlled factory (see Fig. 2). Generally speaking, linear extrapolation can be applied for short term prediction, e.g., Table 1. Topologies and parameters of SVM and the three ANNs. ANN SVM

T

3

FBNN

RBFN

RNN







Kernel

(xi x+1)

C

1000







e

0.01







Net layers



3

2

3

Hidden neurons



3



3

TF in hidden layer



Sigmoid



Sigmoid

TF in other layers



Linear

Gaussian

Linear

TF denotes the transfer function in ANN; Gaussian denotes Gaussian radial basis function.

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Chem. Eng. Technol. 2006, 29, No. 3

Full Paper one or two sampling intervals, in this stage. As a comparison, the product concentration prediction is also made by a linear extrapolation technique in which only the previous measurements of P are needed. The prediction accuracy of linear extrapolation is evaluated separately in terms of two stages, i.e., a middle phase with 40 h < cultivation time ≤ 140 h (Linex1), and a later phase with cultivation time > 140 h (Linex2). The data window and its moving step length for linear extrapolation are set to the same as those of rolling learning-prediction. Prediction performance is evaluated based on the relative prediction error between the actual value and the prediction value. It is defined as: e…Tk ‡ TP † ˆ

PR …Tk ‡ TP † PP …Tk ‡ TP † PR …Tk ‡ TP †

of Linex2. In addition, Fig. 3 indicates that the RMSE of all prediction methods have a steady increase with the increase of prediction horizon. Fig. 4 further presents the prediction RMSE of each of the testing charges for 40 h ahead prediction. It shows that the prediction errors of Linex1 and Linex2 have a great variation from charge to charge, while the prediction errors of the SVM and three ANNs are relatively steady. This indicates linear extrapolation is not very reliable in practical application.

…16†

where PR(Tk + TP) is the actual value at sampling time (Tk + TP); and PP(Tk + TP) is the corresponding predicted value. Eq. (17) gives the root mean square error (RMSE) of the testing charge containing q prediction points in total: v uP u q 2 u e…Tk ‡ TP † tkˆ1 RMSE ˆ q

…17†

Figure 4. The RMSE of ten testing charges for 40 h ahead prediction.

3.2 Prediction Result Fig. 3 gives the average of the prediction RMSE of ten testing charges for 4–40 h ahead prediction. Except for Linex2 (linear extrapolation prediction during the later phase of cultivation), all methods can obtain a prediction accuracy error below 5 %. Clearly, the SVM has the smallest prediction error, and RBFN takes second place. The short term predictions (1–3 sampling points) of Linex1 and Linex2 are comparable with that of the SVM, but the long term predictions are much poorer, especially the long term prediction

Figure 3. Average RMSE of ten testing charges.

Chem. Eng. Technol. 2006, 29, No. 3

3.3 Influence of Measurement Noise The continuously measured variables and laboratory assay data are unavoidably polluted by measurement noise. The raw industrial data used in this paper have been already polluted. In spite of this, extra Gaussian noise is added to test the robustness of all the prediction methods under an extremely noisy environment. Here, extra noise is added to the sampling time and all process variables in the database, as well as in input data vectors. Fig. 5 gives partial process variables of an industrial charge polluted with 5 % and 10 % additional Gaussian noise. This means the charge runs under an extremely noisy situation. Figs. 6 and 7 indicate the average of prediction RMSE of ten testing charges under 5 % and 10 % additional Gaussian noise, respectively. It must be stated here that the measured product concentration PR(Tk + TP) (see Eq. (16)) used for calculating the relative prediction error is the original value unpolluted by the extra noise. Compared with the previous Fig. 3, the prediction accuracies display a relatively large decrease for all prediction methods. But the SVM exhibits the best toleration to noise. Even under 10 % additional Gaussian noise, the average of the RMSE is still below 5 % using the SVM. Moreover, the prediction errors of the SVM and three ANNs do not obviously increase with the increase of prediction horizon TP. Extra noise has become the main factor influencing prediction performance.

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

http://www.cet-journal.com

317

Full Paper 3.4 Influence of a Small Number of Samples

Figure 5. Process variables of an industrial charge polluted with 5 % and 10 % additional Gaussian noise.

In this article, the prediction accuracy is also investigated in small sample cases. Although it’s not difficult to collect enough historical charges to establish a training database for a multi-bioreactor plant, the ability to deal with small sample problems is worth discussing. Moreover, on the premise of barely decreasing prediction accuracy, smaller numbers of samples will be more advantageous since a larger number of training samples inevitably leads to a large computational effort. Fig. 8 gives prediction results using a training database composed of ten charges, h1∼10. In contrast to Fig. 3, the prediction accuracy of the SVM shows only a slight decrease though the number of training pairs is almost reduced by a half, whereas the prediction accuracies of the three ANN methods decrease markedly. The result supports the theoretical view that the SVM has the advantage in the aspect of handling small sample problems. Naturally, the number of training samples doesn’t become infinitely small. For the given example in this paper, the SVM is still able to obtain a prediction accuracy below 5 % for 40 h ahead prediction with a training database h1∼7. The corresponding prediction accuracy of the three ANNs has already worsened, the average of the RMSE of the ten testing charges even exceeds 15 % for 40 h ahead prediction. A further simulation reveals that the prediction accuracy of the SVM will also deteriorate sharply if the number of training pairs is reduced further.

Figure 6. Average RMSE of ten testing charges under 5 % additional Gaussian noise.

Figure 8. Average RMSE of ten testing charges with a small number of training samples.

4 Discussions and Conclusions

Figure 7. Average RMSE of ten testing charges under 10 % additional Gaussian noise.

318

http://www.cet-journal.com

From the above prediction results of product concentration in penicillin production, whether with sufficient training samples, under extremely noisy situations, or with a small number of training samples, the SVM outperforms the linear extrapolation technique and three ANN methods. Occasion-

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Chem. Eng. Technol. 2006, 29, No. 3

Full Paper ally the prediction accuracy of RBFN is as good as that of the SVM, but as a whole the prediction accuracy of the SVM is higher and more reliable. During the middle phase of cultivation, where product concentration behaves quasi-linearly, the prediction accuracy of linear extrapolation is satisfactory, but its prediction accuracy worsens during the later phase of cultivation and/or with large measurement noise. For the SVM, the number of support vectors is closely correlated with the complexity of the given system. For the example given in this paper, the number of support vectors ranges from 35 % to 45 % of all the training vectors. When extending the prediction horizon, the number will increase slightly. When 5 % or 10 % additional Gaussian noise is added to the training data, the percentage will rise to 65–75 %, which simply reflects the complexity of data polluted by extra noise. In addition, when considering a small number of training samples (ten historical charges), the percentage will also rise up to 69–79 % of all the training vectors, which means the total amount of support vectors reduces slightly. This can be regarded as an intuitive explanation of why SVM is competent for small sample problems. In the past, an existing issue was longer computational time for the SVM. But recently, some efficient training algorithms have been proposed and developed to improve computational speed [17–19]. For the product production prediction in this study, time consummation by the SVM is almost equivalent to that of the ANN after adopting the sequential minimal optimization algorithm (SMO) proposed by Platt [17].

Acknowledgements The authors gratefully acknowledge the support of the Natural Science Foundation of China (Grant No. 6057438) and the National High Technology Research and Development Program of China (Grant No. 2001AA413110, open Project Program of the State Key Laboratory of Bioreactor Engineering/ECUST). Earlier study of this work was supported by the Alexander von Humboldt Foundation, Germany. Received: May 5, 2005

Symbols used b C CO2 e K(xi, x) Le(·) m

[–] [–] [kmole] [–] [–] [–] [–]

scalar threshold value penalty factor in SVM total carbon dioxide production relative prediction error kernel function in SVM e-insensitive loss function in SVM dating back steps

Nit O2 P PAA pH pO2 RMSE S TD Tk

[kg] [kmole] [kg/m3] [kg] [–] [%] [–] [kg] [h] [h]

TM TP TS V w j…† h1∼n

[h] [h] [h] [m3] [–] [–] [–]

hn‡1

[–]

e s fi, fi*

[–] [h] [–]

total nitrogen source consumption total oxygen consumption product concentration total phenyl acetic acid consumption pH value dissolved oxygen root mean square error total sugar consumption width of input data window current sampling time at the right border of the input data window moving step width of input data window sampling interval lquid volume in bioreactor weight vector in SVM nonlinear mapping function in SVM set of input-output data pairs of 1 ∼ n historical charges set of input-output data pairs of the predicted charge classification precision in SVM discretization time interval slack variables in SVM

References [1] N. A. Jalel, J. I. Leigh, M. Fiacco, J. R. Leigh, in Proc. of the IEEE Conf. on Control Applications, IEEE, Piscataway, NJ, USA 1994. [2] J. O. Rawlings, Applied Regression Analysis: A Research Tool, Wadsworth, Inc., Belmont, California 1988. [3] J. V. Kresta, T. E. Marlin, J. F. MacGrego, Comput. Chem. Eng. 1994, 18 (7), 597. [4] W. W. Yan, H. H. Shao, X. F. Wan, Comput. Chem. Eng. 2004, 28 (8), 1489. [5] M. R. Warnes, J. Glassey, G. A. Montague, B. Kara, Neuero-comput. 1998, 20 (1–3), 67. [6] C. Shene, C. Diez, S. Bravo, Comput. Chem. Eng. 1999, 23 (8), 1097. [7] J. Q. Yuan, P. A. Vanrolleghem, J. Biotechnol. 1999, 69 (1), 47. [8] V. N. Vapnik, Soviet Mathematics 1968, 9, 915. [9] V. N. Vapnik, Estimation of Dependencies based on Empirical Data. SpringerVerlag, New York 1982. [10] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, Berlin 1995. [11] E. Theodoros, P. Tomaso, P. Massimiliano, V. Alessandro, Comput. Stat. Data Anal. 2002, 38 (4), 421. [12] U. Thissen et al., Chemometrics Intell. Lab. Syst. 2003, 69, 35. [13] M. A. Mohandes, T. O. Halawani, S. Rehman, A. A. Hussain, Renew. Energy 2004, 29 (6), 939. [14] C. Z. Cai, W. L. Wang, L. Z. Sun, Y. Z. Chen, Math. Biosci. 2003, 185 (2), 111. [15] C. J. C. Burges, Data Min. Knowl. Discov. 1998, 2, 121. [16] A. J. Smola, Masters Thesis, Technische Universität München, 1996. [17] J. Platt, Advances in Kernel Methods – Support Vector Learning (Eds: B. Schölkopf, C. J .C. Burges, A. J. Smola), MIT Press, Cambridge 1999. [18] T. Joachims, Advances in Kernel Methods – Support Vector Learning (Eds: B. Schölkopf, C. J. C. Burges, A. J. Smola), MIT Press, Cambridge 1999. [19] S. Fine, K. Scheinberg, J. Mach. Learn. Res. 2001, 2, 243.

______________________

Chem. Eng. Technol. 2006, 29, No. 3

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

http://www.cet-journal.com

319