Bayesian-based on-line applicability evaluation of ... - Semantic Scholar

5 downloads 14077 Views 340KB Size Report
models in modeling automotive paint spray operations. Jia Li, Yinlun Huang. ∗ ... February 2006; accepted 9 March 2006. Available online 18 May 2006 ... based on a specific criterion used in training (Yao, 1999). Some nearly equally good ...
Computers and Chemical Engineering 30 (2006) 1392–1399

Bayesian-based on-line applicability evaluation of neural network models in modeling automotive paint spray operations Jia Li, Yinlun Huang ∗ Department of Chemical Engineering and Materials Science, Wayne State University, Detroit, MI 48202, United States Received 18 February 2005; received in revised form 26 February 2006; accepted 9 March 2006 Available online 18 May 2006

Abstract The neural network (NN) models well trained and validated by the same data may exhibit noticeably different predictabilities in applications. This is mainly due to the fact that the knowledge captured by the NNs in training may be different in depth and breadth. In this regard, using a set of nearly equally superior models, instead of a single one, may demonstrate its robustness of system performance prediction in on-line application. An unresolved issue, then, is how to value the prediction by each model of the model set in each application step. In this paper, we introduce a Bayesian-based model-set management method for constructing a statistically superior model set for on-line application. Specifically, this method is for manipulating the model set by assigning statistically most appropriate weights to the model predictions; the weighted model predictions determine overall system behavior. A repeated use of the method keeps the weights updated constantly based on the newly available system data, which makes the model-set-based system performance description more precise and robust. The efficacy of the method is demonstrated by studying an automotive paint spray process where the thin film thickness on vehicle surface should be precisely predicted. © 2006 Published by Elsevier Ltd. Keywords: Bayesian method; Neural network modeling; Model-set-based prediction; On-line model applicability evaluation

1. Introduction Neural network (NN) techniques have been widely used in system modeling, information classification, process control and optimization in numerous industrial applications (Meireles, Almeida, & Simoes, 2003). An NN is usually structured with connected nodes in a certain topology. In model development, the weights of the connections are adjusted by an algorithm in training, which is a nonlinear optimization process (Dayholf, 1992). Numerous NN training/validation algorithms and tools are available that are effective in generating superior models (Hamm, Brorsen, & Hagan, 2002; Huang, Edgar, Himmelblau, & Trachtenberg, 1994; Kwok & Yeung, 1997; Mackey, 1992a, 1992b; Porto, Fogel, & Fogel, 1995; Reed, 1993). A well trained/validated NN should extract sufficient knowledge that is hidden in a pool of quality data; it should be then capable of using the knowledge to characterize adequately the system of



Corresponding author. Tel.: +1 313 577 3771; fax: +1 313 577 3810. E-mail address: [email protected] (Y. Huang).

0098-1354/$ – see front matter © 2006 Published by Elsevier Ltd. doi:10.1016/j.compchemeng.2006.03.005

interest. It is always expected that an NN be reliable in terms of prediction accuracy and generalization capability in application (Alippi, 2002; Levin, Tishby, & Solla, 1990). A major challenge of model application is how to ensure prediction accuracy continuously. In general, the training and validation data can never be perfect. On the other hand, industrial systems always experience various disturbances and fluctuations, possibly not conceivable, due to a variety of reasons. Hence, the system behavior demonstrated under these circumstances may not be fully captured by the adopted single model based on a specific criterion used in training (Yao, 1999). Some nearly equally good, but discarded models, either topology-wise or parameter-wise different from the adopted one, could have been otherwise chosen if a model selection criterion was different. These models, trained/validated using the same data as that for the adopted model, may demonstrate better prediction and generalization capability under certain circumstances in application. It is argued, therefore, that it might be of advantages for adopting multiple models for application; certainly, these models should be all superior with respect to the criteria used in model development. Petridis and Kehagias (1996, 1998) and

J. Li, Y. Huang / Computers and Chemical Engineering 30 (2006) 1392–1399

Petridis et al. (2001) suggested using multiple models to produce predictions based on their weighted outputs. In this regard, a major challenge is changed from how to identify the most suitable model, which is always arguable, to how to use a set of nearly equality superior models properly (Busemeyer & Wang, 2000; Forster, 2000). One of the key tasks of using a model set is to determine how to evaluate the applicability of each model under each specific condition; it is a model-set management issue. It is conceivable that the applicability of each model may be time dependent, as system environment changes along the time. Thus, the applicability of each model should be periodically evaluated for the most suitable use. In this paper, a model-set-based system characterization scheme is described first. Then, a Bayesian-based model-set management method is introduced for constructing a statistically superior model set for on-line application. Specifically, this method is for evaluating each model in the model set based on each new application environment, which is described by newly available system data. Overall system characterization is based on weighted model predictions. A repeated use of the method keeps the weights updated constantly based on newly available system data. This can make the model-set-based system performance description more precise and robust. To automate the model weight update in on-line application, a Bayesian-based method application procedure will be also introduced. The efficacy of the method will be demonstrated by a case study on automotive thin film thickness prediction by a set of structurally and/or parametrically different NN models.

1393

The predictions by different models in each time step as expressed in Eq. (5) are, in general, different in precision. If the measured outputs are Dy (tk ) = {y1 (tk ), y2 (tk ), . . . , yq (tk )}

(6)

Then, the prediction error of output j by model mi at time tk can be evaluated as ei,j (tk ) = yj (tk ) − yˆ i,j (tk );

i = 1, 2, . . . , N; j = 1, 2, . . . , q (7)

The prediction errors may not be very significant as the models in set M are all superior. Note that it is very unlikely to have a single model that can maintain the best predictions in the entire prediction horizon; it is very possible that at each prediction time, the model providing the most accurate prediction is different. Thus, it should be the best if a set of models, rather than a single model, is adopted in application. Nevertheless, the model contributions to overall system characterization in each specific time instant should be carefully determined. The adequacy of model predictions can be recognized by determining the weights associated with the predictions; each weight represents a preference of an individual model in system description. Fig. 1 depicts the model-set-based system characterization, where the weights associated with the models should be adjusted. Let us define a model preference set, W(tk ), at time tk as W(tk ) = {w1 (tk ), w2 (tk ), . . . , wN (tk )}

(8)

Thus, the system prediction at time tk , Yˆ (tk ), by the model set can be obtained by

2. Multiple model-based system prediction

N 

Given a model set, M, which contains N well-trained and validated models:

Yˆ (tk ) =

M = {m1 , m2 , . . . , mN }

Note that Yˆ (tk ) is a weighted summation, which contains the predicted process information of all outputs by all models, as

(1)

Each model has the same type of inputs, X ∈ Rp , and outputs, Y ∈ Rq , as they are for the same application, but their structures and/or parameters are different. At time tk , if the following set of input data is available: Dx (tk ) = {x1 (tk ), x2 (tk ), . . . , xp (tk )}

wi (tk )ˆyi (tk )

(2)

Then, the N models can be used to generate the following output predictions: Yˆ (tk ) = {ˆy1 (tk ), yˆ 2 (tk ), . . . , yˆ N (tk )}

(3)

where yˆ i (tk ) is the prediction of system behavior by model mi at time tk ; the behavior is characterized by the following q output variables: yˆ i (tk ) = {ˆyi,1 (tk ), yˆ i,2 (tk ), . . . , yˆ i,q (tk )};

i = 1, 2, . . . , N

(4)

Model set M can used repeatedly to predict system performance continuously. The time series predictions in K steps can be expressed as Yˆ = {Yˆ (t1 ), Yˆ (t2 ), . . . , Yˆ (tk )}

(5)

(9)

i=1

Fig. 1. System diagram with model on-line evaluation.

1394

J. Li, Y. Huang / Computers and Chemical Engineering 30 (2006) 1392–1399

defined in Eq. (3). If the weights, wi (tk )(i = 1, 2, . . . , N) is set to 1/N, then the prediction becomes a usual model averaging, which weakens noise equally. The Bayesian method introduced in this work is to provide different weights to the models based the past performance of them, and thus the models used in this way will provide better predictions. The predictability of the model set is largely determined by the weights to the N well-trained models. Thus, the question is how to select/update the weights periodically. The Baye’s theory (Bernardo & Smith, 1994) can be adopted as a basis for developing a method to the determination of the weights associated with the models.

4. Bayesian-based model applicability assessment According to the analysis of Eq. (11), the posterior probability of a model can be considered as a preference of the model. Note that the model preference may change along the time because a model prediction performance may be different. Thus, the weights assigned to the models may change along the time. The engineering basis for weight change is established by viewing new application situations, which are reflected by those newly collected data, and by examining the application performance of all the models in the model set. 4.1. Time-dependent Bayesian expression

3. Bayesian method basics Baye’s theory-based methods use probabilities to describe the degree of belief in parameter values or models, and can give statistical results based on known data (Hoeting, Madigan, Raftery, & Volinsky, 1999; Vila, Wagner, & Neveu, 2000; Wasserman, 2000). The methods have been adopted in a wide array of parameter identification or model comparison approaches in either linear or nonlinear regression modeling (Denison, Holmes, Mallick, & Smith 2002; Raftery, Madigan, & Hoeting, 1997). In data analysis, uncertain entities are modeled as probability distributions, and inference is performed by constructing posterior conditional probabilities for the unobserved variables of interest, given observed data samples and prior assumptions. Mathematically, the Baye’s Theorem is expressed as follows (Bernardo & Smith, 1994): P(H|D) =

P(H)P(D|H) P(D)

P(m)P(D|m) P(D)

wi (tk ) = P(mi |Dy (tk−1 )) =

P(mi (tk−1 ))P(Dy (tk−1 )|mi ) ; P(Dy (tk−1 ))

i = 1, 2, . . . , N

(12)

where P(mi (tk−1 )) is the prior probability, which is the initial belief of model mi before observing Dx (tk−1 ) and Dy (tk−1 ); P(Dy (tk−1 )) is the probability of the occurrence of Dy (tk−1 ); P(Dy (tk−1 )|mi ) is the probability of observing Dy (tk−1 ), assuming model mi is true; P(mi |Dy (tk−1 )) is the preference of model mi after considering the effect of observing Dy (tk−1 ), and it is defined as weight wi (tk ) associated with model mi at time tk .

(10) 4.2. Likelihood estimation

where H is the hypothesis of statistic property of the system of interest; D the evidence used to update the belief in H (note that the evidence is in addition to that used for making the hypothesis); P(H) the prior probability, which is the initial belief of H without D; P(D) the probability of the occurrence of D; P(D|H) the probability of observing D, assuming H is true; P(H|D) is the posterior probability of H after considering the effect of observing D. The Bayesian method can be used to construct a posterior probability that is valuable information for model evaluation (Lampinen & Vehtari, 2001; Sato, 2001). In application, hypothesis H and evidence D in Eq. (1) are, respectively, referred to model m and data set D. Thus, we have P(m|D) =

Assuming that the system information is available at time tk−1 , and the input and output data are contained, respectively, in Dx (tk−1 ) (defined in Eq. (2)) and Dy (tk−1 ) (defined in Eq. (6)). Then, the Bayesian expression in Eq. (11) can be extended to time-variant as follows:

(11)

Note that D is the newly available data set for evaluating model m. Prior probability P(m) contains the prior knowledge about the applicability of model m. Thus, posterior probability P(m|D) is evaluated based on the past information about the model and the new process information. The model with a large posterior probability should be more desirable in model selection.

It is assumed that prediction error ei,j (tk−1 ) in Eq. (7) has a normal or Gaussian distribution with zero mean (µ = 0), i.e. ei,j ∼ N(0, σi,j );

i = 1, 2, . . . , N; j = 1, 2, . . . , q

(13)

For simplicity, it is further assumed that deviation σ i,j is identical (i.e., model independent) and is expressed as σi,j = σj ;

i = 1, 2, . . . , N

(14)

Thus, the normal probability density function for the jth output is     1 y 2 1 fj (y) = √ exp − ; j = 1, 2, . . . , q (15) 2 σj σj 2π Since the prediction errors of the q outputs are all assumed to have a normal distribution, the likelihood, P(D(tk−1 )|mi ), can be expressed as the product of q probability distributions, each of which is related to error distribution, ei,j (tk−1 ) ± y. That is   q ei,j (tk−1 )+y  P(D(tK−1 )|mi ) = fj (y) dy (16) j=1

ei,j (tk−1 )−y

where 2y is the error-centered integration range of fj (y).

J. Li, Y. Huang / Computers and Chemical Engineering 30 (2006) 1392–1399

Note that for the normal distribution, if the integration range is very small, which is generally true as the model prediction errors for the well-trained models are not significant, Eq. (16) can be approximated as P(D(tk−1 )|mi ) ∼ = (2y)q

q 

fj (ei,j (tk−1 ))

(17)

j=1

Using Eq. (15), the following expression can be obtained     q q   1 1 ei,j (tk−1 ) 2 √ exp − fj (ei,j (tk−1 )) = 2 σj σ 2π j=1 j=1 j (18) The above expression is equivalent to the following by using Eq. (7):

1395

4.4. Data observation-related probability estimation Probability P(Dy (tk−1 )) is about the occurrence of output data. According to Bernardo and Smith (1994), the following relationship holds N 

P(mi |Dy (tk−1 )) = 1

(22)

i=1

This should be true for the application to weight assignment, as the sum of all weights should be equal to 1. According to Eq. (12), the above equation can be rewritten as N  P(Dy (tk−1 )|mi )P(mi (tk−1 )) i=1

P(Dy (tk−1 ))

=1

(23)

That is

q 

1 fj (ei,j (tk−1 )) = √ q (σj 2π) j=1 ⎞ ⎛ 2 q   y (t 1 ) − y ˆ (t ) j i,j k−1 k−1 ⎠ exp ⎝− 2 σj

P(Dy (tk−1 )) =

N 

P(mi (tk−1 ))P(Dy (tk−1 )|mi (tk−1 ))

(24)

i=1

(19)

4.5. Weight assessment formula

j=1

Thus, Eq. (17) can be rewritten as q  2 y P(D(tk−1 )|mi ) = π σj ⎞ ⎛ 2 q   y 1 (t ) − y ˆ (t ) j k−1 i,j k−1 ⎠ exp ⎝− 2 σj

(20)

j=1

With the expressions in Eqs. (20), (21), and (24), Eq. (12) can be rewritten as an explicit function of model predictions, measured outputs and preferred deviation. After simplification, weight wi (tk ) for model mi at time tk can be finally expressed as  

q  y (t )−ˆy (t ) 2 exp − 21 j=1 j k−1 σj i,j k−1  wi (tk ) =   ;

N yj (tk−1 )−ˆyi,j (tk−1 ) 2 1 q i=1 exp − 2 j=1 σj k=1

(25a)

4.3. Prior probability evaluation The prior probability, P(mi (tk−1 )), in Eq. (12) presents the preference of model mi before re-evaluation. Therefore, a model with better performance in history will have a greater opportunity for being counted more heavily in future application. Note that without any prior knowledge of model preference, the prior probability is assumed to have a uniform distribution among the N models initially (at time t0 ). From time t1 , the posterior probability of the previous time step, P(mi |D(tk−1 ))(k ≥ 1), can be set as the current prior probability, P(mi (tk )). This should be valid based on such an analysis that a model showing a better prediction performance in the past may have a better chance to perform well in the current time step. But note that P(mi |D(tk−2 )) is equivalent to wi (tk−1 ). Thus, we have P(mi (tk−1 )) = N −1 ;

k=2

P(mi (tk−1 )) = wi (tk−1 );

k>2

(21a) (21b)

Note that the noise in process data may mislead the evaluation to some extent, and updating prior probabilities may reduce prediction errors by providing more preference to better-performed models.

 

q  y (t )−ˆy (t ) 2 wi (tk−1 )exp − 21 j=1 j k−1 σj i,j k−1  wi (tk ) =   ;

N yj (tk−1 )−ˆyi,j (tk−1 ) 2 1 q w (t )exp − i k−1 i=1 j=1 σj 2 k>1

(25b)

Note that weight wi (tk ) has a value only between 0 and 1. A larger value indicates more preference on the prediction by model mi in system characterization at time tk . As an extreme case, a value of 0 (or 1) means to exclude (or include) the prediction by model mi in system performance prediction. 5. Automated model-set management procedure The Bayesian-based weight determination method described in the preceding section can be used to evaluate periodically the degree of applicability of the models in the model set and to create a weighted average prediction for system characterization. The method can be embedded in the following procedure for automatic, continuous management of the model set.

1396

J. Li, Y. Huang / Computers and Chemical Engineering 30 (2006) 1392–1399

• Step 1. Generate model set M that contains N well-trained and validated models (see, Eq. (1)). The models can be structurally or parametrically different. The size of the model set can be determined by a model developer; • Step 2. Update the procedural parameters, such as time instant tk and deviation σ i,j . Initially, tk is set to t1 , and k will be increased by 1 for each repeated use of the procedure. The initial weights, wi (t0 ) (i = 1, 2, . . ., N), is set to 1/N, and deviation σ i,j (i = 1, 2, . . ., N; j = 1, 2, . . ., q) can be selected based on experience (see Eq. (14) as a reference); • Step 3. Form data set Dy (tk−1 ) (see, Eq. (6)); • Step 4. Generate model prediction sets yˆ i (tk )(i = 1, 2, . . . , N) (see Eq. (4)); • Step 5. Calculate weights wi (tk )(i = 1, 2, . . . , N) using Eq. (25); • Step 6. Generate new model-based prediction yˆ (tk ) using Eq. (9); • Step 7. Return to Step 2 for system prediction in the next time instant. Note that after each update of the model weights, the models can be used for system characterization in the period before the next weight re-evaluation. 6. Case study In automotive paint shops, quality control (QC) of vehicle coating has been traditionally realized in an open-loop fashion, i.e., the quality is controlled through inspection on a final product. It is proven that such type of QC is passive methodologically. Filev (2002) introduced a closed-loop QC approach by which a Jacobian matrix model is used in film thickness closed-loop control. To describe the nonlinear, multi-variable process more comprehensively, a number of NN’s have been introduced to correlate process and product performance variables in paint spray (Lou & Huang, 1999; Zhou, 2002). Nevertheless, due to operational uncertainties, it is recognized that a closed-loop control using a single model will be feeble in adapting time-varying operation conditions (Li, 2004). The automatic model-set management procedure introduced in this work is suitable for a variety of applications. Automotive coating thickness prediction is an excellent example to the effectiveness of the method in real-time system characterization. 6.1. Process and prediction task In an automotive paint shop, paint is sprayed to all the panel surfaces of each vehicle by a set of computer-controlled spray bells. The bells are mounted around vehicle as shown in Fig. 2. The paint spray operation is determined by the settings of five key parameters: (a) fluid flow rate, (b) shaping air (controlling spray angle), (c) air temperature, (d) air humidity, and (e) downdraft air flow rate. A key coating quality parameter is the film thickness on a panel, which can be measured in many locations of each panel. In practice, the measurement of dry film thickness is a time consuming task. In a paint shop of producing 300–500

Fig. 2. Sketch of topcoat spray bell settings.

vehicles per shift, only one vehicle (i.e., 0.2–0.33% of vehicles) is usually sampled for thickness measurement. It is known that the most frequent measurement of film thickness in industries today could be one out of five vehicles. It is highly desirable, therefore that, a reliable film thickness prediction model is available that can be used to estimate all those unmeasured vehicles. To improve model prediction accuracy, the scarcely available measured film thickness data can be used to update model weights. In this case, the process studied can provide the following information for 150-vehicle coating operation (representing a 4-h production period): (i) 150 sets of spray-related parameter data (for model input, numbered nos. 1–150), and (ii) 30 sets of average film thickness data (for every other five vehicles, for model output, numbered nos. 1, 6, 11, . . ., 146). It is targeted to predict the average film thickness of all 150 vehicles, where the prediction quality will be examined by comparing them with 30 sets measured data. 6.2. Individual models and performance analysis To accomplish the task, a set of six NN models was selected from a total of 20 models. Each model has five inputs and one output, i.e. yˆ = fNN (x1 , x2 , . . . , x5 )

(26)

where x1 is the paint fluid flow rate; x2 the shaping air; x3 the booth temperature; x4 the booth humidity; x5 the downdraft flow; yˆ the predicted average paint film thickness. The models were trained by 90 sets and validated by 30 other sets (Li, 2004). The six models were identified to be equally good, based on the training/validation criterion (the average model error less than 5% in validation). In Table 1, the first column lists the six models with different numbers of hidden nodes and/or different connection weights if the structures are the same. The second and the third columns list, respectively, the average training error and the average validation error of each model. As shown, Model F gives the smallest average training error (1.71%), but Model A shows the smallest average validation error (3.19%). Overall, the six models demonstrate very close performance.

J. Li, Y. Huang / Computers and Chemical Engineering 30 (2006) 1392–1399

1397

Table 1 Model prediction error analysis Model (5 inputs, 1 output)

Average training error (using 90 data sets) (%)

Average test error (using 30 data sets) (%)

Average prediction error (using new 30 data setsa ) (%)

A (3 hidden nodes) B (3 hidden nodes) C (4 hidden nodes) D (4 hidden nodes) E (5 hidden nodes) F (5 hidden nodes)

1.98 1.98 1.92 2.01 1.89 1.71

3.19 3.92 3.33 4.13 4.51 3.77

3.31 4.41 3.35 4.73 4.01 2.72

Model averaging Model weighted averaging (σ = 0.04)

3.22 2.21

a These 30 sets of data are obtained during the model-set application. A total of 150 sets are collected, including 150 sets of model input data (numbered nos. 1–150), and 30 sets of film thickness data (model output) (numbered nos. 1, 6, 11, . . ., 146).

The six models are individually used to predict the film thickness of 150 vehicles, where only 30 sets of measurement data are available for evaluating their individual prediction error. The fourth column of Table 1 lists the average prediction error of each model in this application. Model D, for example, has the largest error (4.73%), which indicates the poorest performance in application, although it demonstrates a promising performance in model validation. Other models also show prediction performance different from that during model validation. 6.3. Model-set application The Bayesian-based model-set management method is used to operate a model set containing the six models. As shown in Fig. 1, it is to use the introduced method to predict the average film thickness of all 150 vehicles using 150 sets of spray parameters (nos. 1–150). As 30 sets of average film thickness measurement data (nos. 1, 6, 11, . . ., 146) are available, the model weights will be periodically re-evaluated. Fig. 3 depicts the change of the weight associated with each model during the 30-time weight updating process. According to Eq. (9), if model mi has a larger weight wi (tk ) at time tk , then the model prediction will play a more important role in system

Fig. 3. Weight update in model applications (σ = 0.04).

characterization. But note that the weights assigned to the six models are changed every time when the Bayesian-based weight update method is applied. As shown, in the first nine weight updates, Model C has the largest weight associated. Model A then takes the lead until the 12th update. Between the 13th and 16th updates, Model F gains the most significance. After that, Model A regains its place for a short period, and the Model F becomes the most valuable one until the last update. In this application, the six models compete with each other where A and F seem superior to the others for the most time. This can be also proved in Table 1 where A and F have the smallest prediction errors. As stated, the average film thickness data are measured one in every five vehicles. In Fig. 4, the average film thickness of the 150 vehicles is predicted using the model set (see the solid line). The measured 30 sets of data are plotted (see circles); these data were used to adjust model weights periodically. For comparison, the prediction by the best individual model, Model F, is also plotted in the figure. Table 1 shows a comparison of model predictions by the usual model averaging method and by the Bayesian-based method. It is shown that the prediction error by the former is 3.22%, while that by the later 2.21% (σ = 0.04).

Fig. 4. Vehicle film thickness prediction (σ = 0.04).

1398

J. Li, Y. Huang / Computers and Chemical Engineering 30 (2006) 1392–1399

7. Discussion

Table 2 Weights of models under different σ’s

For a successful use of the model-set management procedure, two issues need to be discussed. The first is about how to use the procedure when more than one set of data is available each time when the model weights are to be re-evaluated, and the second is how to select a value of deviation σ that is directly related to the calculation of the weights (see, Eq. (25)).

σ

Weights after 30 time steps

0.15 0.04

Model A

Model B

Model C

Model D

Model E

Model F

0.2670 0.0923

0.0858 0

0.1800 0.0004

0.0537 0

0.0984 0

0.3151 0.9072

7.1. Weight update scheme In the case study, the weights for the model set were updated by running the model-set management procedure with only one set of measured data at the time the weights are scheduled to update. But if more than one set of measured data (e.g., r sets) are available at the time for weight update, then it is desirable to use all r sets of data, which should generate more reliable weights. In this case, the model-set management procedure can be run r times. Each time, the weights will be updated by one set of data, and the updated weights will be updated again using the next set of available data, and so on. It can be proven that the final weights obtained by running the procedure N times (one set of data is used for each run) are the same as those derived by running the procedure only once (all r sets of data are used at one time). This equivalency can be proven below. According to Eq. (12), the weight associated with model mi at time t2 is P(mi (t1 ))P(Dy (t1 )|mi ) wi (t2 ) = P(mi |D(t1 )) = P(Dy (t1 ))

(27)

Using Eq. (21b), weight wi (t3 ) can be expressed as wi (t3 ) =

wi (t2 )P(D(t2 )|mi ) P(D(t2 )) P(mi (t1 ))P(D(t1 )|mi )P(D(t2 )|mi ) P(D(t1 ))P(D(t2 ))

7.2. Selection of deviation value

(29)

Running the procedure r times gives wi (tr+1 ) =

P(mi (t1 ))P(D(t1 )|mi ) . . . P(D(tr )|mi ) P(D(t1 )) . . . P(D(tr ))

to be run only once to update all the weights, when r > 1. In application, the number of observed data sets (r) for weight updating is determined by data sampling frequency, sampling cost, control requirement, etc.

(28)

Substituting Eq. (27) into Eq. (28) yields wi (t3 ) =

Fig. 5. Prediction error by the model set using different deviation σ.

(30)

If the observed data are independent of each other, Eq. (30) becomes P(mi (t1 ))P(D(t1 ), . . . , D(tr )|mi ) wi (tr+1 ) = (31) P(D(t1 ), . . . , D(tr ))

The deviation σ in Eq. (13) can have a major influence on evaluation. As shown in Eq. (25), a weight assigned to each model is a function of σ (Table 2). Fig. 5 depicts the σ-dependent prediction error change for the estimation of average film thickness on vehicles in the case study. It shows that the prediction error reaches the minimum if σ is 0.04. If 0.005 < σ < 0.125, the prediction error by the model set will be between 2.21% and 2.72%; otherwise, the prediction error will be greater than that by Model F. This indicates that selection of σ is an optimization task. One way to select σ is based on model-based prediction error. The optimization task can be defined as minJ =

This expression can be rewritten as   

q  y (t )−ˆy (t ) 2 wi (t1 ) rk=1 exp − 21 j=1 j k−1 σj i,j k−1  wi (tr+1 ) =   ;

N r yj (tk−1 )−ˆyi,j (tk−1 ) 2 1 q w (t ) exp − i 1 i=1 k=1 j=1 2 σj Thus, if the available data set for weight updating is more than 1 (i.e., r > 1), then Eq. (25b) can be replaced by Eq. (32). The benefit is that the model-set management procedure needs

σ

 K  1   Y (tk ) − Yˆ (tk )    K Y (tk )

(33)

k=1

k > 1; r > 1

(32)

s.t. σ ∈ [σmin , σmaz ]

(34)

where Y(tk ) and Yˆ (tk ) are, respectively, the measured and predicted output vectors at time tk ; K is the number of data sets

J. Li, Y. Huang / Computers and Chemical Engineering 30 (2006) 1392–1399

to be used for updating σ value. The σ-value optimization model in Eqs. (33) and (34) can be solved as needed. It can also be integrated into the model-set management procedure (in Step 2). It is noted that if σ increases to positive infinity, the Bayesianbased method is equivalent to the usual model averaging method. Fig. 5 shows that the prediction error curve has the trend to approach the value of the model averaging (3.22%) when the σ increases. 8. Concluding remarks In model-based system characterization, the use of a model set may have advantages against the use of a single model. Usually, a model-set-based prediction should be more precise, smoother and more robust, if the models in the model set are well trained and validated and they are managed properly. The difficulty of model set management is the determination of the acceptance of the prediction by each individual model in system characterization. In this paper, a Bayesian-based weight determination method is introduced, which can be used to update the weights periodically in on-line applications. The periodic update of model-set weights by the model-set management procedure can be automated, which can ensure high-precision predictions, even if the process system undergoes disturbances and fluctuations. The case study has demonstrated its attractiveness. Although the application example in this paper is for the case where the data are time-independent, the Bayesian-based method, however, does not restrict to the case where the data are time dependent. As long as a dynamic model can describe process dynamics properly, the method is still applicable. The introduced Bayesian-based method is general for any model-setbased system characterization. The models can be any number and any type, such as fundamental-based, empirical, steadystate, or dynamic models. Acknowledgments This work is in part supported by NSF (CTS 0091398) and the Institute of Manufacturing Research of Wayne State University. References Alippi, C. (2002). Selecting accurate, robust, and minimal feedforward neural networks. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 49, 1799–1810. Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester: Wiley. Busemeyer, J. R., & Wang, Y. M. (2000). Model comparisons and model selections based on generalization criterion methodology. Journal of Mathematical Psychology, 44, 171–189. Dayholf, J. (1992). Neural network architectures. New York, NY: Van Nostrand Reinhold. Denison, D. G. T., Holmes, C. C., Mallick, B. K., & Smith, A. F. M. (2002). Bayesian methods for nonlinear classification and regression. New York, NY: Wiley.

1399

Filev, D. P. (2002). Applied intelligent control—Control of automotive paint process. In Proceedings of the 2002 world congress on computational intelligence (Vol. 1 pp. 1–6). Forster, M. R. (2000). Key concepts in model selection: Performance and generalizability. Journal of Mathematical Psychology, 44, 205–231. Hamm, L., Brorsen, B. W., & Hagan, M. T. (2002). Global optimization of neural network weights. In Proceedings of the 2002 international joint conference on neural networks (Vol. 2 pp. 1228–1233). Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382– 417. Huang, Y. L., Edgar, T. F. D., Himmelblau, M., & Trachtenberg, I. (1994). Constructing a reliable neural network model for a plasma etching process using limited experimental data. IEEE Transactions on Semiconductor Manufacturing, 7, 333–344. Kwok, T. Y., & Yeung, D. Y. (1997). Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Transactions on Neural Networks, 8, 630–645. Lampinen, J., & Vehtari, A. (2001). Bayesian approach for neural networks—Review and case studies. Neural Networks, 14, 7–24. Levin, E., Tishby, N., & Solla, S. A. (1990). A statistical approach to learning and generalization in layered neural networks. Proceedings of the IEEE, 78, 1568–1574. Li, J. (2004). Adaptive modeling, integrated optimization, and control of automotive paint spray processes. Master thesis, Wayne State University, Detroit, MI. Lou, H. H., & Huang, Y. L. (1999). Neural network-based soft sensor for the prediction and improvement of clearcoat filmbuild. In Proceedings of the AIChE national annual meeting. Mackey, D. J. C. (1992a). Bayesian interpolation. Neural Computation, 4, 415–447. Mackey, D. J. C. (1992b). A practical Bayesian framework for backpropagation networks. Neural Computation, 4, 448–472. Meireles, M. R. G., Almeida, P. E. M., & Simoes, M. G. (2003). A comprehensive review for industrial applicability of artificial neural networks. IEEE Transactions on Industrial Electronics, 50, 585–601. Petridis, V., & Kehagias, A. (1996). Modular neural networks for MAP classification of time series and the partition algorithm. IEEE Transactions on Neural Networks, 7, 73–86. Petridis, V., & Kehagias, A. (1998). A multi-model algorithm for parameter estimation of time varying nonlinear systems. Automatica, 34, 469– 475. Petridis, V., Kehagias, A., Petrou, L., Bakirtzis, A., Maslaris, N., Kiartzis, S., et al. (2001). A Bayesian multiple models combination method for time series prediction. Journal of Intelligent and Robotic Systems, 31, 69–89. Porto, V. W., Fogel, D. B., & Fogel, L. J. (1995). Alternative neural network training methods. IEEE Expert, 10, 16–22. Raftery, A. E., Madigan, D., & Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92, 179–191. Reed, R. (1993). Pruning algorithms—A review. IEEE Transactions on Neural Networks, 4, 740–747. Sato, M. (2001). Online model selection based on the variational Bayes. Neural Computation, 13, 1649–1681. Vila, J. P., Wagner, V., & Neveu, P. (2000). Bayesian nonlinear model selection and neural networks: A conjugate prior approach. IEEE Transactions on Neural Networks, 11, 265–278. Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44, 92–107. Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE, 87, 1423–1447. Zhou, Q. (2002). System optimization of integrated process plants for environmentally benign manufacturing. Ph.D. thesis, Wayne State University, Detroit, MI.

Suggest Documents