Computers chem. Engng Vol. 20. Suppl.,pp. $599-$604, 1996
( ~ Pergamon
S0098-1354(96)00109-3
Copyright © 1996ElsevierScienceLid Printed in Great Britain.All rights reserved 0098-1354/96 $15.00+0.00
BATCH PROCESS MONITORING FOR CONSISTENT PRODUCTION E.B. MARTIN*, A.J. MORRIS, M.C. PAPAZOGLOU and C. KIPARISSIDES + Centre for Process Analysis, Chemometrics and Control Department of Chemical & Process Engineering *Department of Engineering Mathematics University of Newcastle, Newcastle upon Tyne NEI 7RU, UK. +CPERI, University of Thessaloniki, Greece. E-mail:
[email protected],
[email protected]
Abstract A number of limitations have inhibited the success of batch process monitoring:- the finite and variable duration of a batch, the presence of significant non-linearities, the lack of on-line sensors for measuring quality variables, the absence of steady-state operation, the difficulty of developing accurate mechanistic models and process measurements that are autocorrelated in time as well as being correlated with one another. Recent approaches to the monitoring of batch behaviour have been based on extensions of the statistical projection methods of Principal Components Analysis (PCA) and Projection to Latent Structures (PLS) - multiway PCA and multiway PLS. These techniques form the bases of the multivariate statistical process control charts for batch process monitoring. The control limits for detecting when a process is moving out of control for multivariate SPC charts are based upon Hotelling's T 2 statistic. A new approach which allows the nominal data to dictate the form and shape of the bound, the M 2 statistic, is reviewed. Finally, an application of multivariate SPC and the impact the different confidence bounds have on process operation is highlighted by application to a batch methyl methacrylate polymerisation reactor.
1. INTRODUCTION The major aims of monitoring plant performance are the reduction of off-specification production, the identification of important process disturbances and the early warning of process malfunctions or plant faults. The early detection of process malfunctions, followed by the location of their source, can lead to significant improvements in product quality and consistency. Consequently, on-line performance monitoring has become an integral and extremely important part of batch processing. In many cases, however, industrial batch systems are poorly understood, the relationships between system variables are complex and highly non-linear and the process variables are difficult to quantify making the control objectives difficult to achieve. In batch processes the general objective is to achieve a welt-defined end-point. This significantly alters the manner in which modelling and control is implemented since batch systems tend to pass through a much wider range of dynamic conditions than continuous processes. This wide range of dynamic operation in batch processing places significant demands upon the modelling techniques adopted. In order to minimise the batch-to-batch variation it is essential that the variability between the batch trajectories is monitored and controlled to meet desired state profiles. MacGregor and Nomikos (1992) have extended the multivariate statistical process control (SPC) methods of Principal Component Analysis (PCA) and Projection to Latent Structures (PLS) to batch processes, multiway PCA (MPCA) and multiway PLS (MPLS). These approaches allow the monitoring of a batch process to be achieved once a model has been developed from nominal or good batch operation, (Nomikos and MacGregor, 1994, 1995). Intrinsically linked with process monitoring is the requirement for the early warning of process malfunctions in a Statistical Process Control (SPC) sense. Two approaches for calculating the control limits to identify the movement of the process away from nominal operating behaviour are reviewed - those based upon the standard distributional assumptions of Hotelling's T 2 statistic, Tracey et al (1992) and a novel approach, using kernel density estimation and the bootstrap, the M 2 statistic, Martin and Morris (1995). The latter approach allows the data to dictate the shape of the operating region and questions whether the exclusion of a certain percentage of good batches, as is the case with bounds calculated using Hotelling's T 2 statistic, is in practice the best approach. The M 2 statistic gives the user freedom to identify how many batches should be excluded on the basis of poor performance. 2 THE MULTIVARIATE STATISTICAL APPROACH The philosophy of Statistical Process Control (SPC) is that the behaviour of a process can be characterised using data obtained when the process is operating in a state of "statistical control". A state of "statistical control" is said to exist if certain key process variables or quality variables remain close to their desired values. Consequently, all luture $599
$600
European Symposium on Computer Aided Process Engineering--6. Part A
operation is referenced against this "in control" model. SPC refers to a set of statistical techniques that involve the use of traditional Statistical Quality Control (SQC) charts. The most commonly applied SQC charting methods are the Shewhart charts (X and Range charts), the Exponentially Weighted Moving Average (EWMA) charts and the Cumulative Sum (CUSUM) charts. The objective of these charts is to monitor the performance of a process over time to verify that the process consistently produces within specification product. Most industrial processes are characterised by multivariate behaviour i.e. more than one variable affects the process behaviour. The application of univariate techniques can therefore result in misleading information being presented to process operatives resulting in the possibility of unnecessary or erroneous control actions being taken. The implementation of multivariate statistical process control techniques and the associated control charts can overcome the limitations of univariate process performance monitoring. These control charts retain the overall simplicity of univariate monitoring charts. Of considerable relevance and importance, is the increasing amounts of data being routinely collected on processes. Frequently available are observations on a number of process variables, such as temperatures, pressures, flows and possibly some concentrations. These observations are usually measured every few minutes or less. A data-base of past successful and unsuccessful batches provides a reference or nominal operating set. The major difficulty with large amounts of multivariate data is that the variables being measured are never independent, but rather, they are autocorrelated in time and in addition correlated with one another at any given time (collinear). The true dimension of the process is much lower than that implied by the number of variables being monitored. Finally in batch processes, it is not only the relationships between the variables at any one time which are important, but it is also the past history of the state trajectories of all these variables. In trying to overcome these difficulties, a number of multivariate statistical projection methods have been used and include Principal Component Analysis (PCA) (Jolliffe, 1986) and Project to Latent Structures (PLS) (Geladi and Kowalski (1986)). These methods are particularly suited to analysing large sets of correlated data. The information contained within the process variables is summarised in terms of a reduced set of latent variables by projecting the information down onto low dimensional subspaces. Principal Component Analysis is used to explain the variability in a single data block. It calculates latent vectors, which are uncorrelated, called principal components that describe the direction of greatest variability in the data set. Projection to Latent Structures is conceptually similar to PCA, except that, it simultaneously reduces the dimensions of both the process and quality variables spaces to find these latent vectors. The combination of these projection methods with multivariate control charts underpin Multivariate Statistical Process Control (MSPC). The important strategic question is whether the nominal model can be used to identify all possible faults. Control charts are more suited to identifying process drift than identifying faults. The Squared Prediction Error (SPE) or Q-statistic is a more robust fault detection tool. It is based on the contributions of all the process variables and therefore, any fault, that can be associated with these variables is more easily identified. Using SPE plots, faults occurring during a batch process operation can be identified. This reveals the potential strength of the MPCA model not only to discriminate between normal and abnormal batches, but also to detect faults. An abnormal batch will exhibit significant deviations in an SPE plot. There are two ways in which a new batch can exhibit deviations from the MPCA model. In the first case, the scores can move outside the acceptable range of variation defined by the control region. These deviations can be observed in any control chart associated with the principal component scores. In this case the model is still valid, but the magnitude of the variation during the new batch is too large. In the second case, the residuals may be large and the batch will be placed well outside the nominal region, perpendicular to the reduced space. These deviations can be detected by the SPE plots. In this case, the model is no longer valid, because a new event, not included in the reference set, has occurred and the new batch is not projected onto the reduced space. Quality variables are collected on a less frequent basis with the majority of these measurements being made in the quality control laboratory. These problems of infrequent and time variable laboratory assays have made the development of monitoring schemes that utilise the information provided by the process variables as one of the most attractive approaches to the problem of batch performance monitoring. In this paper, interest will focus on the work related to Principal Component Analysis for batch processes performance monitoring. The results are however directly transferable to process monitoring (MSPC) using the latent variables from a Projection to Latent Structures Analysis. 3. P R I N C I P A L C O M P O N E N T S ANALYSIS Principal Component Analysis is a mathematical tool which reduces the dimensionality of the original data (X) by defining a series of new variables, the principal components, which span the multidimensional space of X. Each of the principal components are a linear combination of the original variables and are defined to be orthogonal to one another. The first principal component defines the maximum variance in the data, with subsequent components explaining decreasing levels of variability in the data. The two most popular approaches for calculating principal components are singular value decomposition (SVD), Golub (1983); and non-linear estimation by iterative least square procedures (NIPALS) algorithm, Geladi and Kowalski (1986). The first principal component is defined as:-
European Symposium on Computer Aided Process Engineering--6. Part A
$601
t I = gpl
The loadings, P i , define the direction of greatest variability, and the principal component score vector, t i , represents the projection of each object onto Pi" If significant correlations exists between variables, the number of principal components required to account for most of the variation in the data is less than the number of the original variables. The residual variation is generally eliminated through the assumption that the final components represent the noise in the data. PCA is a method for analysing a single data matrix (X) for feature extraction, data compression and outlier identification. 4. M U L T I W A Y PRINCIPAL COMPONENTS ANALYSIS Batch data differs from continuous data in that the problem is now three dimensional, the added dimension being that of time. The major issue which arises is how to handle the large number of measurements taken on the process which are themselves not independent. The measured variables are also autocorrelated in time. It is not simply the relationship between all the variables which is important, it is the entire past histories of the trajectories. The data reduction technique of principal components analysis can be used to project the information down onto a lower dimensional space which summarises the variables and their time history during previous successful batches, i.e. multiway PCA. A simple way to view multiway PCA is to consider opening out the three dimensional matrix into a two-dimensional array, by placing each 2-D block consecutively and performing a standard PCA, MacGregor et al., (I 994), figure 1.
Variables
(M)
rime~
\
\
_ _ ~1
T(2)
T(K)
Variables
Figure I:- The unfolding approach proposed by Nomikos and MacGregor (1993) Variables are highly correlated when they represent the time trajectories of the original process variables and therefore a considerable reduction in dimension can be achieved when performing multiway PCA. The loadings matrix summarises the information in the data with respect to both the variables and their time variation and the scores represent a summary of the overall performance of a single batch. The objective of multiway PCA is to decompose the three-way array X into a series of principal components consisting of score vectors (t r ) and loading matrices (Pr or u n f o l d e d Pr ) plus a residual E, which is as small as possible in a least squares sense: R
R
or x=Etr.p r=l
+E
r=l
Multi-way PCA not only utilises the magnitude of the deviation of a variable from its mean trajectory, but the correlation structure between variables. Indeed, it is the correlation structure which can be particularly important in the detection of faults.
5. CONFIDENCE BOUNDS (Definition of Warning and Action Limits) Nomikos and MacGregor (1995) demonstrated the application of the MPCA model for monitoring purposes using nominal operating data. Once a model has been developed which is reflective of the nominal operating region, multivariate control charts with control limits defined using modifications to Hotelling's T 2 statistic can be developed. A recent breakthrough in the generation of confidence bounds which acknowledges the natural distribution of the data is that of likelihood-based confidence bounds. This new approach allows the data to dictate the shape of the confidence bounds providing operational personnel, for the first time, with the facility to include all batches which they believe conform to nominal behaviour. These data based (non-parametric) confidence bounds are less likely to incorporate regions of non-conforming operation, Martin and Morris (1995).
$602
European Symposium on Computer Aided Process Engineering----6.Part A
5.1 Hotelling's Squared Distance
By assuming the underlying k-dimensional process is normally distributed, we can determine whether the process is in control by calculating Hotelling's (1947) squared distance for the pair of principal components of primary interest, normally principal component 1 and principal component 2: To2 = n ( g 0 - x ) r S - l ( I . t 0
- x)
and identify points outside the calculated limits. T 2 is distributed as the statistic ( - l ) k F / ( n - k ) , where F has a central F-distribution with k and n-k degrees of freedom. These are used to establish control limits with an 100ct% chance of a false alarm, where ot is typically 0.01 or 0.05. An out of control signal is identified when :T°2 = (n - l )kFk,n_~:a n(n - k) 5.2 Likelihood-Based Confidence Regions
An alternative approach to defining the nominal operating region is to construct a likelihood confidence region, using bootstrap techniques and non-parametric density estimation. The Standard Bootstrap is a technique which generates new data by sampling with replacement from the original data. One approach to non-parametric density estimation is the kernel estimator. The kernel estimator is defined as a sum of'bumps' placed at the observations, figure 2.
f(x)
OSI 0.3
0
0.2
0.2
0.1
0.1
flxl
0.4
o4
0.4
0.3
-~
o
z
X
'4
(b)
o4
-z
o
z
X
Figure 2 :- Kernel Estimator
The kernel density function is defined as :A
B - -
Y(~h)-(Sh~)-'~., KC(X- Y~)/h},
X 9~d
i=1
where h is the window width. The Kernel function, K, defines the shape of the bumps whilst the window width determines their width. The effect of varying the window width is that as the limit of h tends to zero, a sum of Dirac delta function spikes at the observations will result. As h becomes large, all detail, spurious or otherwise, is obscured. Although other kernel functions can be used, there is very little to choose between their performance on the basis of the integrated squared error, Silverman (1978). The nominal operating region for the M 2 statistic is constructed based upon a likelihood 95% and 99% confidence region for a vector parameter 0 of length d, using the bootstrap and non-parametric density estimation, Hall (1987), i.e. likelihood based confidence bounds. B independent samples are drawn from X, the score vectors, using the nonparametric bootstrap, i.e. sampling directly from the original vectors, with replacement. The variable Yi, 1/^-~2 A n/2 V (0- 0), is calculated for each of the bootstrap samples and a density, in this particular case a kernel density estimator, can then be fitted to the distribution of the Yi's. 5.3. Engineering Implications o f M 2 Statistic
In selecting a reference or nominal data set the inherent assumption is that all production defined by this data set is acceptable to the customer. With Hotellings T 2 statistic, by definition, a proportion of the data will theoretically lie outside the confidence bounds, e.g. for a sample of size 20, one point would lie outside the 95% confidence bound, providing that the data arose from a normal distribution. In the proposed M 2 approach there is built in flexibility to the technique which enables the user to select the proportion of data which lies outside the bounds. In practice if all the data is from valid production then the bounds should enclose all the nominal data. Selection of the bounds then becomes an engineering issue as opposed to a statistical exercise. Indeed, in some applications of the T2 confidence
4
European Symposium on Computer Aided Process Engineering---6. Part A
$603
bound to batch processes, large areas of 'empty space' appear within the bounds, Nomikos and MacGregor (1995). This tends to imply that a new batch produced lying significantly far away from the previous production but still lying within the T2 bounds is arbitrarily deemed to be good! Such an assumption is unrealistic. 6. APPLICATION TO A BATCH POLYMERISATION REACTOR The batch polymerisation reactor studies used to illustrate the implementation of multiway PCA are based on a detailed mathematical model and simulation of a pilot scale methyl methacrylate (MMA) reactor installed at the University of Thessaloniki in Greece. Heating and cooling of the reaction mixture is achieved by circulating water at an appropriate temperature through the reactor jacket. The reactor temperature is controlled by a cascaded regulator system consisting of a primary PI and two secondary PI control loops. The manipulated variables for the two secondary regulators are hot and cold water flow rates. These streams are mixed prior to entry to the reactor jacket and provide a flexible heating/cooling system. The mathematical model which includes reaction kinetics, heat and mass balances has been developed to provide a rigorous simulation which has been validated against the pilot plant. Using this simulation, representative studies of reactor operation and the effects of process malfunctions and faults can be realistically studied. In this study, a number of good production batches are obtained by monte-carlo simulation to provide a nominal (or reference) data set. Two types of malfunction are studied - initiator impurity problems and reactor fouling problems. Multiway PCA was carried out on the reference data set and new data monitored during the occurrence of process malfunctions analysed. Multiway PCA was applied to the forty nominal batches. One principal component explained approximately 69% of the overall variability in the data, whilst 90%, 94% and 96% of the cumulative variance was explained by two, three and four components, respectively. The methodology of Cross-Validation was used to determine the number of principal components to incorporate into the nominal model, Wold (1978). Two latent variables were identified as being necessary and sufficient to describe the predictable variation of all the variables about their average trajectories. This provides a model which describes adequately the normal trajectory of a batch run. Scores
tot PC# 1 versus PC# 2 100
IO0
0
I
} -100
"1-S0500
-100[
-,me
~
-2oo
-~oo
0
1oo
I~0SOO
Figure 3 :- Scores plot for principal component 1 and 2 with Hotelling's T 2 defined nominal region
~
i
i , -3co -200 -loo Pnnclpal Component 1
o
loo
Figure 4 :- Scores plot for principal component 1 and 2 with nominal region defined using M 2 approach.
Figure 3 shows the results of a multiway-PCA analysis with the nominal operating region defined using Hotelling's Z 2 statistic. In contrast, figure 4 shows the results based upon the M 2 statistic. The three batches lying close to the nominal operating region are associated with a fouling problem. It is interesting to note in this application, the standard approach draws within its bounds one of the batches identified to be associated with a fouling problem, In comparison the new empirical metric M 2 does not include the non-conforming batch within the nominal data region. Hotelling's T 2 approach is generally conservative in its definition of the bounds and typically areas of no information on batch outcome are encompassed within the bound as clearly demonstrated in figure 3. The area between the warning and the action limits contains no information on the outcome of a batch, but the operator would be required to assume that the batch quality was still satisfactory even though no prior information supported this argument. This problem does not arise with the new M 2 metric.
Batches can also be compared with MPCA by plotting their Sum of Squares Errors (SSE) or Q Residuals : m.I
Qi = ~/__E(i,j) 2 i=l
$604
European Symposiumon Computer Aided Process Engineering~6. Part A
The Q Residuals from this model and for each of the forty good batches are shown in Figure 5. The Q values represent the squared perpendicular distance of the (mxl) dimensional point of each batch from the reduced space defined by the two principal components of the MPCA model. From this figure it can be seen that all the batches have been explained adequately, since none of them has an unusual large residual exceeding the 99% confidence limit. Sum of Squared Error (Q) lot all the Batches o~ PC# 2
m
95*/°
m
m
o
5
10
15
20
25
30
35
40
45
Number of Batch
Figure 5:- Q Residuals of the reference database from the MPCA model with 95% and 99% confidence limits. 7. CONCLUSIONS The paper has described an application of multivariate statistical process control to batch processes and has presented a new approach to the generation of confidence bounds for use in the analysis of process data and in multivariate statistical process control. The approach provides a new, alternative, statistic (the M 2 statistic) to Hotelling's T 2 statistic that acknowledges the natural distributional nature of the data. One of the major advantages of the likelihood based confidence region is that the density of the data defines the orientation of the bounds and consequently regions of non-conforming operation are less likely to be incorporated which is necessarily the case for regions based on Hotelling's T 2 statistic. Work is continuing in this area, in particular investigation of region updating with the M 2 confidence bound and to its application to non-linear SPC. 8. ACKNOWLEDGEMENTS The authors would like to acknowledge the support of the Departments of Chemical and Process Engineering and Engineering Mathematics, and funding from the European Community under grant BRE2 - CT93 - 0523 / Project Number BE - 7009 INTELPOL, Intelligent Manufacture of Polymers.
9 REFERENCES Geladi, P. and Kowalski, B. R. (1986). "Partial Least Squares Regression : A Tutorial", Anal. Chim. Acta., vol 185, pp 1-17. Golub, G.H. and VanLoan, C.F. (1983) 'Matrix Computations', North Oxford Pub. Co. Hall, P. (1987), "On The Bootstrap and Likelihood-based Confidence Regions", Biometrika, 74, pp 481-493. Jolliffe, I.T. (1986). "Principal Component Analysis". Springer-Verlag, New York. MacGregor, J.F., and T. Kourti. (1995). "Statistical Process Control of Multivariate Processes". Control Eng. Practice, Vol. 3, No. 3,403-414. MacGregor, J.F., Nomikos, P. and Kourti, T., (1994), "Multivariate statistical process control of batch processes using PCA and PLS", Preprints of the IFA C ADCHEM'94 Conference on Advanced Control of Chemical Processes, May, Kyoto Japan. Martin,E.B and Morris,A.J (1995) 'Non-parametric Confidence Bounds', Internal Report, Centre for Process Analysis, Chemometrics and Control (CPACC), University of Newcastle. Nomikos, P.,and J.F. MacGregor. (1994). "Monitoring of Batch Processes Using Multiway Principal Components Analysis". AIChE Journal. Nomikos, P., and J.F. MacGregor. (1995). "Multivariate SPC Charts for Monitoring Batch Processes". Technometrics, 37, 41-59. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Chapman and Hall. Tracey, N.D., J.C. Young, and R.L. Mason, (1992), "Multivariate Control Charts for Individual Observations", Journal of Quality Technology, vol 24, No. 2, pp88-95. Wold, S. (1978). "Cross Validatory Estimation of the Number of Components in Factor and Principal Components Models", Technometrics, Vol 20, No. 4, pp 397 - 404.