Document not found! Please try again

Data based modelling for rapid process understanding ... - Google Sites

1 downloads 123 Views 144KB Size Report
We present a data-based modelling approach that iteratively allocates new ..... running Windows XP.) .... of molecular d
Data based modelling for rapid process understanding and design under uncertainty Tao Chen, Yanhui Yang School of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, Singapore 637459, Singapore (e-mail: [email protected]). Abstract: We consider an off-line process design problem where the response variable is affected by several factors. We present a data-based modelling approach that iteratively allocates new experimental points, update the model, and search for the optimal process factors. A flexible non-linear modelling technique, Gaussian process (GP), forms the cornerstone of this approach. GP model is capable of providing accurate predictive mean and variance, the latter being a quantification of its prediction uncertainty. We present a strategy for utilizing this prediction uncertainty through application to the design of a catalytic reaction system. Finally, we describe a sensitivity analysis approach to model interpretation, which is similar to analysis of variance (ANOVA) in traditional response surface methodology. Keywords: Design of experiments, Gaussian process model, model interpretation, process optimization, response surface methodology, uncertainty. 1. INTRODUCTION Mathematical models are the foundation of the systems approach to the design of chemical and other processes (Klatt and Marquardt, 2009). Models can be developed through the representation of fundamental principles that govern the process, and thus they are termed firstprinciples or mechanistic models. Alternatively, models may be purely based on experimental data and are called data-based or empirical. Although data-based models are typically reliable only within the operating region where the data are collected, they have seen wide applications due to the simplicity of model development and implementation. This is especially true if the process is still in its early design stage, whereby the time and resources needed for mechanistic modelling are hardly justifiable. This study is further restricted to batch-wise (as opposed to time-dependent) modelling that relates the process response (y, e.g. product yield) to the operating factors (x = [x1 , . . . , xd ]T , e.g. reaction temperature and pressure), where d is the number of factors. These models are typically used in off-line design tasks to facilitate the understanding and optimization of the process. This is in contrast to dynamic process models which are mainly used for on-line control and optimization purposes. Such a data-based model is the central component of the so-called response surface methodology (RSM) for rational process design (Myers and Montgomery, 1995). The traditional method in RSM is to fit a polynomial function (typically linear, quadratic or cubic polynomial) to the experimental data, followed by identifying the process factors that optimize the objective function. However, the prediction accuracy of the polynomial regression is usually

unsatisfactory due to the restrictive functional form, and consequently the model-based process understanding and optimization may be unreliable. To address this issue, flexible non-linear models have been applied to provide a more accurate approximation of the process behaviour, such as artificial neural network (ANN) (Shao et al., 2007), support vector machine (SVM) (Hadjmohammadi and Kamel, 2008), and Gaussian process (GP) regression (Yuan et al., 2008). GP models are particularly attractive since it not only attains high prediction accuracy, but also quantifies the uncertainty of the prediction. As a consequence, the model-based optimization problem can be formulated to account for the uncertainty, and the identified optimal process factors are expected to be more robust against modelling uncertainty. This paper extends the previous GP-based process optimization (Yuan et al., 2008) to an iterative approach, whereby the GP model is used to help search for the best process performance incrementally. We adopt a worst-case approach to handle the prediction uncertainty. We further presents a statistical procedure, based on sensitivity analysis, to interpret how the process response is affected by the factors. The proposed method is successfully applied to a catalytic reaction system to convert trans-stilbene to epoxides, which is an important intermediate in chemical and pharmaceutical industry. 2. ITERATIVE MODELLING AND OPTIMIZATION The overall approach is illustrated in Fig. 1. Initially, statistical design of experiments (DoE) is applied to give the initial design points for conducting experiments. The experimental data are used to develop a GP model to relate the process response to the factors. Based on the

C(x(i) , x(j) ) = a0 + a1

d X

(i) (j)

xk xk

k=1

+ v0 exp −

d X

k=1

Fig. 1. The flowchart of the proposed algorithm.

(i)

model, we allocate new design points by maximizing a certain objective function, and new experiments will be conducted to update the model. We discuss the various components of this algorithm in this section. 2.1 Initial experimental design Initial DoE is required to obtain the data for empirical modelling. The classical fractional factorial and central composite designs were proposed to investigate the interactions of process factors based on polynomial models (Myers and Montgomery, 1995). These classical designs typically assign two or three pre-determined levels for each process factor, and experiments are conducted at the combination of the levels of different factors. Using a small number of levels is especially appealing if the factors’ values are difficult to change in practice. However, this strategy may not have an optimal coverage of the design space due to limited levels of the factors being studied, and thus it may result in a less reliable empirical model. The recognition of this disadvantage of classical DoE methods has motivated the concept of “space-filling” designs that allocate design points to be uniformly distributed within the range of each factor (McKay et al., 1979). Among this class of designs, the Latin hypercube sampling (LHS) (McKay et al., 1979) has received wide application as a result of its simple implementation and good performance. LHS is adopted in this study for initial DoE. 2.2 Overview of Gaussian process model GP, also termed kriging model in the literature with slightly different formulation (Sacks et al., 1989), is a flexible modelling technique that can be used for both regression and classification purposes (Rasmussen and Williams, 2006). The GP regression model can be derived from the perspectives of ANN and Bayesian non-parametric regression; see (Rasmussen and Williams, 2006) for details. In this subsection a brief overview of Gaussian process regression model is given, including the formulation and implementation of the model. Consider a data set with n data points, {x(i) , y (i) ; i = 1, . . . , n}, where we use super-script to index data points and sub-script to index dimensions. A GP for regression is defined such that the regression function y(x) has a Gaussian prior distribution with zero mean, or in discrete form: y = (y (1) , . . . , y (n) )T ∼ G(0, C)

(1)

where C is an n × n covariance matrix of which the ij-th element is defined by the covariance function: C(x(i) , x(j) ). An example of such a covariance function is:

(i) wk (xk



(j) xk )2

!

+ δij σ 2 (2)

(i)

where x(i) = (x1 , . . . , xd ); δij = 1 if i = j, otherwise it is equal to zero. We denote θ = (a0 , a1 , v0 , w1 , · · · , wd , σ 2 ) as “hyper-parameters” defining the covariance function. The hyper-parameters must be non-negative to ensure that the covariance matrix is non-negative definite. The term “hyper-parameter” is used to differentiate Gaussian processes from parametric regression, where the parameter is required to be estimated. For the covariance function depicted in eq. (2), the first two terms represent a constant bias (offset) and a linear correlation term, respectively. The exponential term is similar to the form of a radial basis function, and it takes into account the potentially strong correlation between the outputs for nearby inputs. The term σ 2 captures the random error effect. By combining both linear and non-linear terms in the covariance function, Gaussian process is capable of handling both linear and non-linear data structures. Other forms of covariance functions are discussed in (Rasmussen and Williams, 2006). The prediction at a new data point x∗ is also Gaussian distributed: y ∗ ∼ G(ˆ y ∗ , σy2ˆ∗ ) with yˆ∗ = kT (x∗ ) C−1 y σy2ˆ∗ = C(x∗ , x∗ ) − kT (x∗ ) C−1 k(x∗ )

(3) (4)

where k(x∗ ) = [C(x∗ , x(1) ), . . . , C(x∗ , x(n) )]T . The capability to providing the prediction uncertainty in terms of the variance is an important feature of Gaussian process for robust process design (Yuan et al., 2008). The hyper-parameters θ can be estimated by maximizing the following log-likelihood function using optimization algorithms: 1 1 n L = − log det C − yT C−1 y − log 2π 2 2 2

(5)

A conjugate gradient method is usually used to find the hyper-parameters that maximize the above likelihood (Rasmussen and Williams, 2006). A Matlab implementation of the GP model is publicly available from http:// www.gaussianprocess.org/gpml/, and it was used to produce the results in this study. 2.3 Model-based region-search and optimization The GP model that relates the process response to the factors provides the basis to guide the search for more promising process factors. The salient feature of GP model is the availability of predictive variance, which should be accounted for in the optimization framework. In this study, we adopt the method of worst-case scenario by maximizing the lower-bound of the response predicted by the GP

Table 1. Process factors considered to study the trans-stilbene epoxidation process. Range 60 − 120 0.2 − 0.8 1−5 200 − 1250 30 − 240

model. (Suppose the objective is to maximize the response variable.) That is, xnew = arg max (ˆ y ∗ − ασyˆ∗ ) , ∗

s.t. x∗ ∈ S

(6)

x

where S denotes the factors’ range, and α is a weighting coefficient, e.g. α = 1.645 corresponds to 95% lowerbound of the prediction. The above optimzation problem can be solved using sequential quadratic programming or other methods, depending on the property of the specific problems. An alternative approach is to allocate multiple experimental points for the new iteration. This is relevant in practical situations where the scheduling of equipments/instruments favours conducting several experiments within a certain period of time, or high-throughput facility is available to run multiple experiments simultaneously. To suit such situations, we allocate new design points to the region defined as follows {x∗ : yˆ∗ − ασyˆ∗ > b AND σyˆ∗ > c AND x∗ ∈ S}

(7)

Essentially, this is to search for the factors such that the 95% lower-bound of the response is greater than b. Furthermore, we add a constraint σyˆ∗ > c to avoid allocating design points to well-explored region, where the GP model is quite certain about its prediction (i.e. with small σyˆ∗ ) and thus further experiments in this region are not necessary. The proper value of b and c should be adjusted at each iteration based on the information from experimental data.

Iter 1 Iter 2

No. data 20 40

GP RMSE R2 5.42 0.85 3.77 0.96

Quadratic RMSE R2 7.66 0.70 5.31 0.92

60 50 Conversion (%)

Process factor Temperature, x1 (◦ C) Partial pressure of oxygen, x2 (Bar) Initial stilbene concentration, x3 (mmol/15 mL) Stirring rate, x4 (rpm) Reaction time, x5 (min)

Table 2. LOOCV results.

40 30 20 10 0 1

2 Iteration

Fig. 2. Box-plot of conversion rate at two iterations. 3.1 Results In the initial iteration, the knowledge about the process is relatively limited, and the LHS algorithm is used to obtain 20 design points within the whole range of five factors for experiments. After experimentation, the corresponding conversion rates are obtained for GP modelling. A distribution of these 20 conversion values is illustrated in Fig. 2 using a box-plot. Clearly, the initial results are not satisfactory. Although two runs gave conversion rate as high as 55%, the median conversion is only 6.2%. Subsequently, we will demonstrate the use of GP model to guide the new experiments in order to find better process performance.

3. CASE STUDY

Before the model is applied for subsequent region-searching or optimization purpose, its predictive capability should be assessed by the well-known cross-validation procedure. In this study, we adopt the method of leave-one-out crossvalidation (LOOCV) to validate the GP regression model, and the results are given in Table 2 in terms of root mean squared error (RMSE) and coefficient of determination (R2 ). For the purpose of comparison, a conventional multiple quadratic polynomial regression model with stepwise variable selection is also developed. Clearly, the GP model has attained significantly higher prediction accuracy than the quadratic regression. A GP model is then developed from all the available 20 data points, and this model is used to search for better process performance in the next iteration.

We consider to optimize a lab-scale catalytic reaction to achieve the maximal conversion rate in the epoxidation of trans-stilbene over Co2+ -NaX (cobalt ion-exchanged faujasite zeolite) catalyst. The catalyst was synthesized inhouse using ion-exchange techniques. Five process factors are considered: reaction temperature, partial pressure of oxygen, initial trans-stilbene concentration, stirring rate and reaction time. The range of these factors to be explored is listed in Table 1.

Due to the scheduling arrangement of our lab, we chose to allocate 20 new experimental points for the next iteration according to eq. 7. Since in the first iteration, only four experiments attained conversion rates higher than 15%, we set b = 15% in eq. (7), in the hope to explore the factors’ region with conversion higher than 15%. In addition, we set c to be the average standard deviation of predictions in the LOOCV procedure, so that to generate design points that are not well predicted by the current model. Based on these choices, the incremental LHS algorithm generates a

Clearly there are an infinite number of design points that satisfy eq. (7). Hence the region-searching essentially becomes a constrained DoE problem, where the factors’ values are selected to be uniformly distributed in the constrained space that is defined by eq. (7). This constrained DoE problem can be solved by using incremental LHS as detailed in (Tang et al., 2010).

new set of 20 design points, and the obtained conversion rates are illustrated in Fig. 2. The figure confirms that the proposed framework has successfully identified more promising region of the process factors. The median and maximal conversion rates at iteration 2 have been improved to be 31.1% and 61.1%, respectively. In addition, recall that the objective of the region-searching in iteration 1 is to find the process factors with conversion higher than 15%. This objective has been fulfilled for 18 out of 20 experiments at iteration 2. Therefore, it appears that the GP model is reasonably reliable for predicting the process response variable. In principle, the proposed method can be iterated multiple times as required, and the number of iterations should be decided by the experienced experimenters after careful examination of the results. In this study, we restrict the number of iterations to two. Indeed, the specialists in catalytic reactions also feel that the identified factors so far may be close to the optimal condition achievable given the current experimental environment. To enable the optimization in the final iteration, a new GP regression model is required to approximate the response surface, using all the 40 data points available. Again, the results of LOOCV from GP are significantly better than traditional quadratic regression models (Table 2). Also note that adding more data typically has positive effect on reducing the prediction error in the models. Based on the finally developed GP model from all the 40 experimental data, the optimization problem defined in eq. (6) is solved. The optimal process condition is found to be: x1 = 120 ◦ C (temperature), x2 = 0.63 bar (partial pressure of oxygen), x3 = 1.00 mmol/15mL (initial stilbene concentration), x4 = 1250 rpm (string rate), and x5 = 120 min (reaction time), and the GP model predicts the conversion rate to be 94.5%. The actual experiment at this claimed optimal condition attains a conversion rate of 93.5%, which is reasonably close to the predicted value and is regarded as satisfactory under the current constraints of experiments. 4. MODEL INTERPRETATION In this work, model interpretation is referred to identifying how the process factors influence the response variable. When traditional polynomial functions are used in RSM, the models are self-explanatory: the regression coefficients clearly indicate the sign and magnitude of impact of process factors and factor interactions. Furthermore, a powerful technique, analysis of variance (ANOVA), can be employed to identify whether the impact is statistically significant. Unfortunately, when more accurate and complex models (such as GP) are used, such a clear interpretation is lost (Brown, 2007). In this work, we suggest to apply global sensitivity analysis (SA) to address this issue. 4.1 Global sensitivity analysis Global SA is based on a decomposition of the model into main effects and interactions (Saltelli et al., 2008):

y = f (x) = E(y) +

d X

zi (xi ) +

X

zi,j (xi,j )

1≤i

Suggest Documents