International Journal of Systems Science, 2000, volume 31, number 11, pages 1459 ± 1471
Fault detection in continuous processes using multivariate statistical methods P. R . G OULDING y* , B. LENNOXy, D . J . SANDOZ y, K . J . SMITH y and O. M ARJANOVICy T he approach to process monitoring known as multivariate statistical process control (MSPC) has developed as a distinct technology, closely related to the ® eld of fault detection and isolation. A body of technical research and industrial applications indicate a unique applicability to complex large-scale processes, but has paid relatively little attention to generic live process issues. In this paper, the impact of various classes of generic abnorm ality in the operation of continuous process plants on MSPC monitoring is investigated. It is shown how the e ectiveness of the MSPC approach may be understood in terms of model and signal-based fault detection methods, and how the multivariate tools may be con® gured to maximize their e ectiveness. A brief review of MSPC for the process industries is also presented, indicating the current state of the art.
1.
Introduction
Fault detection in chemical systems has been the subject of numerous studies in the past few decades. Initial work in the area employed a variety of paradigms to both detect and characterize faults, including signal-based, model-based and knowledge-based approaches (Willsky 1976, Frank 1996, Isermann 1997). These methods have proven very successful whenever costbene® t economics have allowed for the considerable e ort involved in developing applications. An alternative approach to fault detection and diagnosis that has received considerable interest in recent years is based on the use of multivariate statistical techniques (MacGregor and Kourti 1995, Wise and Gallagher 1996). These approaches, collectively referred to as multivariate statistical process control (MSPC), were originally developed for the monitoring of continuous and batch processes (Kresta et al. 1991, Nomikos and MacGregor 1995). The name re¯ ects an association with (univariate) statistical process control methods, and much of the technology ® nds some counterpart therein. The word `control’ can be somewhat misleading, as links with automatic control are still emergent (Piovoso and Accepted 11 October 1999. { Control Technology Centre, School of Engineering, University of Manchester, Manchester, UK. *Author to whom correspondence should be addressed. E-mail :
[email protected]
Kosanovich 1994, Chen et al. 1998), and the major thrust of the technology can be seen to run parallel to that of fault detection and isolation. In the context of fault detection on a continuous plant, MSPC may be seen as a response to the complexity and quantity of information available in a modern computerized installation. Knowledge of plant behaviour is often of poor quality, and the more advanced model and signal-based approaches are hampered by non-stationary, nonlinear behaviour, and by the sheer size of the system. By contrast, MSPC has traditionally employed static linear analysis, and has avoided giving individual attention to all possible faults, preferring to look at deviations from normal operation in a generic sense. By exploiting the presence of highly correlated sensors in a typical process plant to o set the limitations of this simplicity, a very powerful address to the problem is arrived at. At present, the practical technology remains ® rmly based on two techniques (or their close analogues) : principal component analysis (PCA) and partial least squares (PLS), also known as projection to latent structures (Geladi and Kowalski 1986, Jackson 1991). Advances are concentrated on a number of important fronts, including the nonlinear and classi® cation issues (Wold et al. 1989, Kramer 1992, Dong and McAvoy 1996, Chen et al. 1996) ; the dynamic nature of plant data (Dayal and MacGregor 1996) ; and con® guration within a feedback setting (Chen et al. 1998). The industrial future of
International Journal of System s Science ISSN 0020± 7721 print/ ISSN 1464± 5319 online # 2000 Taylor & Francis Ltd http ://www.tandf.co.uk/journals
1460
P. R. Goulding et al.
the technology lies now in its generic applicability, as evidenced by its support by major process automation and solution vendors. Existing work has largely concentrated on application-speci® c concerns, and on technological innovations which remain to be validated in the context of generic application. This paper attempts to discover the potential for the existing methods to detect and identify generic classes of process fault, and to identify con® gurations of the methods which maximize this capability. Further, we attempt to set the multivariate statistical methods in the broader framework of fault detection literature. The work focuses on continuous processes only. This paper is divided into six main sections. Section 2 describes a number of general process faults and characteristics that are speci® c to process systems. Section 3 then describes PCA and PLS in detail, and explains how the information resulting from MSPC can be interpreted to both detect and isolate process faults. This is followed in section 4 with a demonstration, using a simulated system, of how MSPC was interpreted to detect and isolate a series of generic process faults. In section 5, the data interpretation techniques are applied to industrial data obtained from a chloro-carbon production plant. Finally, a brief summary and conclusions section is presented. 2.
Plant models and faults
In this section we present de® nitions and taxonomies relevant to the generic fault detection issues addressed in this paper. Much of the work presented in this paper focuses on a simulated four-input, four-output generic chemical plant. As the plant is intended to highlight issues of a generic nature, we consign its description to Appendix C and ® gure C1. For any two vertices, a and b, we term a to be an `origin’ of b if the direction of a connecting path ¯ ows into b. The vertex, b, is then described as an `insertion’ of a. In a generic sense, we term a vertex an origin if it is the origin of one or more distinct vertices ; in a similar fashion a vertex is generically termed an insertion if it is the insertion of one or more distinct vertices. An origin which is not also an insertion we call a `source’ . The graph containing a set of vertices which are unconnected to any other set is termed a sub-plant. In any given sub-plant, the independent signals may be identi® ed with the set of sources. For a typical subplant under feedback control, such as that depicted in ® gure C1, the sources include: the set-point values (s), measurement noise (·) and disturbance processes (²). It is relevant that under feedback control, the actuator values (x) and the controlled values (y), are junctions of one another. We de® ne a plant `artefact’ to be behaviour of any set of plant components which would not occur under
normal operation. As such, all faults are considered artefacts, although the reverse is not necessarily true. We de® ne a `primitive artefact’ as being a change in the plant parameters associated with any individual edge or source. In these terms, the possible artefacts become all possible combinations of the primitive artefacts. The primitive artefacts can be partitioned into: changes in statistics of a source ; changes in the dynamics associated with an edge. From the point of view of the process engineer, measured signals in a plant can be broken down into these types : set-point, actuator, controlled output, uncontrolled output, and disturbances. The primitive artefacts can therefore be further subdivided as follows. Changes in: (1) set-point statistics ; (2) dynamics associated with any edge associated with a controller ; (3) dynamics associated with any edge not associated with a controller ; (4) disturbance statistics ; (5) measurement error statistics on actuators ; (6) measurement error statistics on controlled outputs ; (7) measurement error statistics on uncontrolled outputs.
3.
Component technologies
The work described uses the methods of principal component analysis (PCA) and partial least squares (PLS), also known as projection to latent structures. These methods are brie¯ y outlined in Appendix A ; full descriptions may be found in Geladi and Kowalski (1986), and Jackson (1991). 3.1. Con® dence bounds and assigning cause In practise, both PCA and PLS are trained on a set of normal process operating data. The results are then used to analyse fresh process data in two ways: to assign con® dence bounds which then act as detectors for the presence of artefacts ; and to attribute causes to violation of these bounds. There are a number of approaches to these tasks. For PCA, the analysis is applied separately to data variation in the space of the principal components and in the prediction errors. Initially we examine the case for a single vector of fresh data, z, normalized using the same parameters as the training data. Variation in the prediction errors is measured using kEk22 , a statistic called the squared prediction error (SPE) or alternatively, Q. Con® dence limits are usually obtained by employing normal distribution assumptions (Jackson 1991).
Fault detection in continuous processes Variation in the space of the principal components is measured using a quadratic form to indicate the distance of a given vector from the centre of the training data. Assuming the np PCA loadings, pk , have been calculated for the training data, the distance, d …z†, is given by " # np X pk pTk T d …z† ˆ z z ¼2k kˆ1
where ¼2k is the variance of the k’th principal component. If we de® ne a scalar, t-k ˆ zT …pk =¼k †, then each term in the sum is of the form zT …pk pTk =¼2k †z ˆ t- 2k . The vector, t-z ˆ bt-1 ; t-2 ;. . . ; t-np c is thus a scaled sub-vector of the score of z. The scaling is such that the covariance matrix of the t-k for the training data is the unit matrix. Because there is no preferred directionality in the new basis, con® dence bounds applied to this measure are usually assigned as spheroids, based on kt-z k22 . The distance measure, kt-z k22 , is termed `the T 2 statistic’ . Arguments based on the normal distribution lead to a similar measure, and can be applied to obtain appropriate con® dence limits (Jackson 1991). Plant operator displays often show plots of the lower index scores plotted against one another in pairs, occasionally with the Q statistic as a third dimension. Con® dence bounds on these plots emerge as ellipses, projected from the (re-scaled) spheroidal bounds noted above. Visual interpretation of these plots becomes dif® cult, however, in instances where one axis of a given ellipse is signi® cantly longer than the other. In such cases the display will compress data in the direction of the shorter axis to the point where it occupies only a small fraction of the display area. In studies we have conducted, plant sta have expressed a preference for circular bounds obtained by plotting the scaled scores, t-k , noted above (Goulding 1995, Goulding et al. 1996). Sensitive detection of artefacts requires detection of changes over time in the joint probability distribution function of the process variables (Gibson and Melsa 1975). Tracking such changes requires analysis of a matrix of historical data (or state variables arising from such a history), and sophisticated sequential techniques have been developed in other areas of fault detection to carry out this task (Basseville 1988, Frank 1996). Practice in MSPC appears to be lagging these advances, re¯ ecting the almost exclusive application of static models in the ® eld. The most common applications involve tests applied to individual data points, as described above. More sophisticated approaches employ a sliding window or an exponentially weighted moving average (EWMA) approach (MacGregor and Kourti 1995). Con® dence limits are then based on normal assumptions, either directly or using the concept of the average run length (ARL, Crowder 1987).
1461
The chemical process plant is typically a nonlinear and time-varying dynamic system, incurring nonstationary random contributions arising from nonGaussian sources. These factors place it far from the ideals assumed by the static normally distributed models upon which the MSPC approaches rely for change detection. It could be argued that the most restrictive of these assumptions made by the MSPC approach to artefact detection is that of normality. If we attempt to identify a non-stationary nonlinear plant using a stationary linear model, then even idealized strictly stationary white Gaussian disturbances will appear otherwise under the restrictions imposed by the identi® cation procedure. The impact of non-normal statistical behaviour on hypothesis testing (detection of artefacts) may be severe. Non-normal behaviour can also reduce the e ectiveness of PCA and PLS to provide parsimonious models (models with a low number of latent variables) by virtue of the quadratic norms employed in the derivations of these approaches. If this emerges as a problem then robust equivalents should be employed (Jackson 1991, Chen et al. 1996). Regardless of the success or otherwise of the address to nonlinear and dynamic issues, however, hypothesis testing for process plants should address the presence of clustered data and non-normal disturbances. Detection of change based on non-normally distributed data has been given powerful address in the ® elds of non-parametric statistics (Kendall 1979 ± 1983) and neural pattern recognition (Ripley 1996). These technologies have been slower to gain ground in the MSPC community than elsewhere. In particular, we mention the use of the bootstrap approach to obtaining detection limits, and the increasingly powerful kernel density methods (Chen et al. 1996, 1998), which represent a fusion of neural and statistical modelling. Assignment of cause is the practise of assessing which plant variables have contributed to observed out of control behaviour. A number of approaches to this task are employed within MSPC, including assessment of contributions to the T 2 statistic (Wise and Gallagher 1996), contribution to the Q statistic (Miller et al. 1993), model prediction errors (Tong and Crowe 1995, Dunia et al. 1996), contributions to individual scores (Montague et al. 1998), and partial correlations or similar approaches (Ibrahim and Tham 1995). More complex classi® cation approaches may be used where known fault conditions have been studied (Hand 1981, 1982). Contributions to the T 2 statistic are obtained by taking the gradient of T 2 with respect to each variable. We present the method in Appendix B for the general case suitable for both orthogonal and non-orthogonal loadings, the latter being required for PLS. Assessing contributions to the individual scores is also possible, and is a more common approach ; this method is also
1462
P. R. Goulding et al.
presented in Appendix B. We consider the contribution to the T 2 approach the more reliable of the two, as it is usual for more than one score to indicate out of control behaviour. The contribution to the Q statistic from a given variable is simply the squared prediction error on that variable. Assigning causes for even a relatively simple plant will almost certainly require the use of additional information to narrow down the search for candidate artefacts. Such di culties argue for an intelligent system to assist in this task. The points raised above with regard to de® ciencies in the standard MSPC approach to change detection are especially relevant in this regard. 3.2. Con® guration and behaviour of PL S and PCA We apply PCA and PLS to the sub-plant pictured in ® gure C1. The plant has four inputs (labelled x) and four outputs (labelled y), three of which are under feedback control ; set-points for these three are labelled s. Signi® cant disturbances and measurement noise are as shown. PCA is applied to actuator values and outputs separately, and also applied to both of these sets of signals together. In what follows, we describe some general observations on the behaviours of these tools when employed in this manner. Applying MSPC to process data can yield di erent results depending upon which elements of the data are being analysed. For example, it is possible that the algorithms can be applied to causal data (X), e ect data (Y ) or a combination of the two. The following paragraph details a number of observations relating to such an application of MSPC. In principle, where there exists a signi® cant relationship between two sets of variables, X and Y , analysis of these data together may be used to provide better detection of faults than looking at them separately as X and Y (Mathai and Rathie 1975, Basseville 1989). This observation may be studied for the case of PCA in its relationship to the so-called Fisher information measure. While proof s of this ® nding are complex (Goulding et al. 1999), an intuitive explanation is provided here, supported by results from the applications presented below. Let us assume we have a plant with two variables, x and y, each with mean square variation ¼2x ˆ ¼2y ˆ 1. Assume that we have one principal component only (and so the subspace associated with Q is of dimension 1). By increasing the correlation between x and y, we can force Q ! 0, as increasing the correlation leaves little variation unexplained by the single principal component (see the text below relating PCA to total least squares for justi® cation of this fact). As Q decreases to zero,
even a very small disturbance (with magnitude ½ 1, e.g.) to the measured data will become easy to detect in the subspace associated with Q, as we expect the measured signal to have negligible power in that region. The argument is completed by noting that if we analysed the variables x and y separately, a very small disturbance could not be detected as x and y vary with a much larger magnitude. While the argument above has been considerably simpli® ed and stripped of rigour, it does lend intuitive support to our ® ndings that joint analysis of cause and e ect variables has a tendency to expose faults. There do exist counterexamples, however. Faults a ecting only plant measurements whose associated loadings are negligible in the low variance subspace, i.e. the subspace of Q, and also of those loading vectors with small associated eigenvalues, will not easily be detected. This situation may arise in at least two cases: whenever plant excitation during capture of training data was not rich; or where a good plant model can be found which excludes the variable(s) in question. An example of this latter cause is seen later in section 4. A further caveat is thrown up by the analysis of Goulding et al. (1999). Joint PCA analysis of X ¡ Y will in general increase the dimensionality of the subspace associated with the residuals, Q. This e ect may force setting of less sensitive detection thresholds. These facts notwithstanding, joint analysis of X ¡ Y is recommended as the preferred approach for a general plant. Justi® cation for these theoretical ® ndings is o ered in application to benchmark problems and to live plant data in Lennox et al. (1999). It can be argued that for the majority of plants operating in closed-loop, the most signi® cant deviations in the inputs will fall within a subspace over which control of outputs is e ected. It follows from this observation that the PLS latent variables should cover approximately the same subspace as that covered by the principal components of the input variables (PCA-X). In certain situations, artefacts a ecting the statistical behaviour of a subset of plant variables (we call them `dumb variables’ ) are not relevant to plant monitoring. For example, if it is not desirable to monitor the performance of operating teams in a manually controlled plant, the statistical behaviour of the inputs ± in isolation of the other variables ± may be irrelevant. In such circumstances, the statistical variation of the remaining plant variables should be conditioned on the dumb variables. We have developed a number of approaches to this task. Here we take a simple regression-based approach. This method amounts to the following procedure. Let the complete set of plant variables be termed Z, and the set of dumb variables be D » Z. Then the set of non-dumb variables is the compliment of D, which we denote D 0.
Fault detection in continuous processes . Regress the variables D 0 on D, using data taken from the plant during normal operation. . For the data upon which we wish to test for a possible fault, use the model obtained from the regression to generate prediction errors for each variable D 0. Denote these prediction errors by L . . All fault detection procedures are subsequently applied to L . In the context of the linear models employed in this work, we have employed PLS in the regression procedure. Simple monitoring of outputs is still required, in order to ensure they remain in control. Model prediction errors arising in this way also provide information on the validity of the model used to obtain them. As a result, the approach may be used to detect errors in the edges of the plant diagram. It is relevant here that the higher index loadings of a PCA model (those associated with the residuals) furnish models relating the plant variables (Nievergelt 1994). If a plant fault lies in a source, rather than an edge in the plant diagram, the models may still hold, and the artefact will tend to be displayed in the scores, rather than the residuals. For this to be the case, all the models implicit in the higher components must hold true, which will in general only be the case if the plant has been richly excited over the period in which the training data were captured. Because the plant variation in the space of the principal components may be very large, the Q statistic will be much more sensitive to other changes than T 2 . In cases where the training data have been rich, the impact that an artefact may have on each statistic may be used to discriminate between certain classes of fault, as described above. With the exception of these artefacts, the primitive artef acts may in general a ect both the residuals and scores, and so distinction between residuals and scores is not relevant for diagnosis. All process plants are nonlinear, albeit to varying degrees. Because PCA identi® es linear relationships, con® dence limits should in principle be placed on Q conditioned on each data vector. Because this information is not directly available from PCA analysis, we observe that a high value of T 2 indicates a poor density of training data in the region of the measured vector. The counter-statement does not necessarily hold (Sutanto and Warwick 1995), therefore we propose the following approach. . Monitor Q using one or more EWMA windows with time constants appropriate to expected faults. Set the detection threshold to a relatively low level of sensitivity.
1463
. Monitor T 2 to indicate the validity of the Q statistic. Set the detection threshold to a relatively high level of sensitivity. Because it can be shown that PCA applied in this fashion does not, in general, provide near-optimal models of plant constraints (Solari 1969), we should look to other approaches, e.g. regularized total least squares. The standard MSPC regression technology, PLS, may not serve well in this function, as it provides only indirect address to the issue of errors in the variables (Solari 1969). The errors in the variables issue arise from noise on measured actuation and feedforward signals, and model mismatch due to nonlinear behaviour. Further, if predictions of any given variable are to be regressed on all other variables, then a separate model is required for each output (Dunia et al. 1996). Here we use PLS to predict the process outputs from input values ; the analysis accordingly treats inputs as dumb variables. Prediction errors from this model are then analysed using PCA. Following the ® ndings above, we propose the following approach. . Monitor Q of the inputs to indicate the validity of the regression model. Set the detection threshold to a level of sensitivity which ensures detection of signi® cant degradation in predictive capability. . If the plant exhibits signi® cantly nonlinear behaviour, monitor the T 2 of the inputs in a similar fashion. This will highlight when the inputs are operating outside the training region, which is an indicator of likely model failure if the plant is signi® cantly nonlinear. The issue is investigated more fully in Lennox et al. (1999) and Goulding et al. (1999). The fact that a change in T 2 and Q may characterize faults other than model degradation argues for rich excitation if full discrimination is to be achieved. Rich excitation is needed to ensure that false models of the plant are not evident in the residuals space, as poor excitation will leave subspaces of the inputs unexcited. In contrast to the case for application of PCA to the plant measurements, there are no a priori grounds on which to distinguish between power within the subspace described by the principal components and that outside of it. PCA is used simply to provide a regularized estimate of the (inverse) correlation matrix. This analysis suggests monitoring a convex statistic of the form: R¶2 ˆ ¶T 2 ‡ …1 ¡ ¶†Q. An optimal choice of the parameter, ¶, will be a function of the data, and may be obtained from non-parametric approaches. The simplicity of the PCA method suggests a less sophisticated approach, however, and arguments which led to the use of the T 2 statistic indicate using R2np ˆ T 2 ‡ ……n ¡ np† =¼2Q †Q, where n denotes the number of variables in the data vector ; np the number of principal
1464
P. R. Goulding et al.
components ; and ¼2Q the variance of Q for the training data. We call the statistic, R2np , the `regularized T 2 statistic’ ; a full analysis is presented in Goulding et al. (1999). The following approach for monitoring residuals is therefore proposed. 2
R2np ,
. Monitor the regularized T statistic, of the residuals using one or more EWMA windows with time constants appropriate to expected faults. Set the detection threshold to a relatively low level of sensitivity. For PCA applied separately to inputs and outputs for plant monitoring, we monitor T 2 as a static variable, and apply EWMA windows to Q. A similar approach is applied to the inner relationships for PLS. In practice we propose the use of non-parametric statistics, e.g. the bootstrap, kernel density or similar methods to set suitable detection thresholds for all of the above approaches. A strong relationship between MSPC and model- and signal-based fault detection schema is obvious. The e ectiveness of both PLS and PCA for detection of change depends on the provision of (one or more) good process models. While static linear models are the norm in MSPC, this analysis suggests a place for dynamic and nonlinear models where predictive ability would be enhanced by their use. The work of Wold et al. (1989), Dayal and MacGregor (1996), Wilson et al. (1997) and Lennox et al. (1998) suggest directions for investigation. Change in the subspace of the principal components may be interpreted as a signalbased analysis ; in principle this analysis also may be dynamic, following classical PCA time-series
approaches (Priestley et al. 1974) or nonlinear kernel methods (Husmeier and Taylor 1997).
4.
Simulated analysis
Normal operating data from the simulated chemical plant depicted in ® gure C1 were generated, and 500 samples used to train PCA and PLS models. Data from the simulation are depicted in ® gure 1. Process faults were then introduced into the system, re¯ ecting each of the seven classes of generic artefact described above. These artefacts included : faults a ecting a controlled output sensor (FCY) ; an uncontrolled output sensor (FUY) and an actuator sensor (XF) ; an abnormal grade change (GC) ; a new schedule (NS) ; a process hardware failure (PF); and a change to the statistics of an unmeasured disturbance (UD). The sensor faults, FCY, FUY and XF were introduced by adding a steadily increasing bias to the sensor measurements of y1 , y4 and x1 , respectively. The set-point of y1 was biased by a constant o set to simulate an abnormal grade change. A new schedule represents a shift in control priorities. Examples include a change in operator shift on a manually controlled sub-plant, or an alteration of optimization priorities in a sub-plant under model predictive control. To simulate this e ect, the controller cost function associated with the input variables, x2 and x4 , was altered. A process fault was introduced by imposing constraints on the actuator for x2 . Finally, the variance of the disturbance variable, ²2 , was steadily ramped up from its normal value. Con® dence bounds on R2np and Q were applied using two EWMA windows. The ® rst window had a short
25.00
Data Value
20.00 15.00
x1 y1
10.00 5.00 1
51 101 151 201 251 301 351 401 451 501 551 601 651 701 Sample Num ber Figure 1.
Example data from simulated system.
1465
Fault detection in continuous processes Table 1.
Summary of results for simulated system. Figures rounded to nearest 10 samples.
PCA on X Fault FCY FUY XF GC NS PF UD
T2 120 * 110 0 0
PCA on Y
Q * * * * 0 0 50
y y
PCA on X and Y
T2 *
Q
y
30 * * * * *
* 10 * * *
T2
0 y y y
4.1. Results For each PCA and PLS model, cross-validation indicated that the system contained three degrees of freedom. For each fault, 100 sets of data of 200 samples were generated, and the mean number of samples for each MSPC approach to detect the fault recorded. Results are indicated in table 1. A asterisk indicates that the artefact was not detected, while a dagger ( y )
R2np 40 20 40 0 0 0 40
Q 60 80 40 * 0 0 30
y y y
time constant designed to pick up abrupt changes, which here included the grade change, a new schedule, and process hardware failure. The second window with a longer time constant was designed to highlight slow degradations, represented here by the sensor drifts, and change in the disturbance statistics. Con® dence bounds on T 2 were applied at each time instant. Detection thresholds were set using the percentile bootstrap method (Efron and Tibshirani 1993) at 95% limits for T 2 , and 99% limits for R2np and Q, respectively.
PCA on PLS prediction errors
T2 100 * 110 0 0 10 90
Value of T
2
2 1.5 1 0.5 0 21
41
61
81
101
121
141
161
Sample Number Value of T^2 Figure 2.
Q * * * * 0 10 50
denotes that the artefact was detected intermittently. The T 2 and Q ® gures shown for PLS inner relations relate to thresholds set for plant monitoring ; those set for validating models are discussed in the text. Figures 2 and 3 illustrate results typical of the kind obtained from this analysis. The ® gures show the T 2 and Q charts obtained by applying PCA to the input and output data for the fault FCY. In each case the fault was introduced into the system at time 0. Here the Q chart is shown detecting the fault, while the T 2 chart succeeds intermittently. Our observations on the results are presented below. With the exception of the artefact FUY, the results support the view that PCA-XY at least equals the performance of PCA-X and PCA-Y separately. In particular, PCA on Y is unable to detect FCY, XF, NS, PF and UD. This is because the controller is able to regulate the output variables at their set-point values and hence the artefact is not observable in Y . The exception for FUY arises due to the fact that y4 is an insertion of y3 and the
2.5
1
PLS inner relations
95% Confidence Limit
T 2 chart for FCY.
181
1466
P. R. Goulding et al. 16 14
Value of Q
12 10 8 6 4 2 0 1
21
41
61
81
101
121
141
161
181
Sample Number Value of Q Figure 3.
Q chart for FCY.
relatively small disturbance term, ²3 only ± introducing a strong plant constraint of the kind outlined in section 3.2. A similar phenomenon results in failure of the Q charts arising from PCA-X, PCA on the PLS prediction errors and through analysis of the PLS inner relationships, to detect a number of artefacts, e.g. FCY. These results highlight the need for a large number of correlated sensors to ensure rapid detection of arbitrary process artefacts. As expected, the Q statistic was more successful than the T 2 in detecting every artefact for PCA-XY ; this ® nding did not hold for PCA applied to X and Y separately, due to the speci® c characteristics of the artefacts induced, which tended to be in the principal component space. Despite the con® guration of the PLS analysis ± using the inputs as dumb variables ± the PLS residuals have signalled an artefact during the simulated grade changes. Inspection of the inner relationship Q indicated that the inputs had a signi® cant projection onto a subspace where the PLS model was not trained. Detection based on the PLS model was therefore invalidated. A similar situation held for the new schedule. The results
Table 2.
99% Confidence Limit
suggest that the PLS residuals statistic, R2np , may provide a better indication of changing process condition than PCA. This result supports our observations of section 3.2, indicating that the success of the MSPC approach rests on a good plant model. Table 2 shows a list of putative causes for each artefact, ranked by contribution to T 2 , Q and R2np . Figure 4 shows an example of the contributions to Q assigned to each variable for the fault FCY. The observed results support the view that the contributions may provide useful indication of the cause of each type of artefact. Exceptionally, the approach was unable to distinguish between artefacts a ecting y3 and y4 . The strong statistical relationship between these variables observed above prevents the method from distinguishing between these variables. The PLS model was not capable of distinguishing between X and Y contributions, due to its con® guration. Otherwise, the results of PCA on X are qualitatively identical to those obtained through analysis of the inner relationships of the PLS model for both detection and assignment of cause.
Summary of results for simulated system
Artefact
PCA on X and Y
PCA on PLS prediction errors
PLS inner relations
FCY FUY XF GC NS PF
y1 ; y2 ; y3 y4 ; y3 x1 ; y1 ; y2 y1 x2 ; x4 ; x1 x2 ; x4 ; x1
y1 y3 ; y4 y1 ; y2 y1 y3 ; y4 ; y2 ; y1 y3 ; y4 ; y1
x1 ; x3 No error detected x1 ; x2 ; x4 x3 ; x1 x2 ; x3 ; x4 x2 ; x3 ; x4
1467
Fault detection in continuous processes 250
Individual SPE
200
150
x1 x3 y1
100
50
0 1
21
41
61
81
101
121
141
161
181
Sam ple Num ber
Figure 4.
Application to industrial data
Data were obtained from a series of reactors in a chlorocarbon production plant, operated by ICI Chemicals. These reactors form the heart of a Per/ Tri process, converting chlorine or HCl and oxygen to perchloroethylene and trichloroethylene. The reactors consist of a large shell and tube heat exchanger ® lled with a catalyst. The heat of reaction, produced in the reactor, is removed by oil. This oil, which ® lls the walls of the reactor, is then used to raise steam for use elsewhere on the site. For this study a little over 3 months of data was supplied. These data consisted of six cause variables and a total of 40 e ect variables, many of which are
temperature measurements made around the reactor and are therefore known to be highly correlated. The e ect variables are fed to the cause by operator feedback. PCA and PLS models were developed using ¹70% of these data. The remaining samples ± which included data collected during a reactor ® re which led to a reactor shutdown ± were passed through these models and the results analysed. Figure 5 shows the T 2 chart obtained when PCA was performed using the combined input and output data. The graph clearly shows the T 2 value increasing at ¹ sample number 870 and exceeding the 99% con® dence limits, indicating an artefact has entered the system. This
16 14
10 8 6 4 2
Sample Number Value of T^2 Figure 5.
95% Confidence Limit
T 2 chart for industrial system.
1001
951
901
851
801
751
701
651
601
551
451
401
351
301
251
201
151
101
51
0 1
Value of T
2
12
501
5.
Individual prediction errors for FCY (three most signi® cant contributors only).
1468
P. R. Goulding et al. 4.5 4 3.5 Value of Q
3 2.5 2 1.5 1 0.5 1001
951
901
851
801
751
701
651
601
551
501
451
401
351
301
251
201
151
101
51
1
0
Sample Number Value of Q Figure 6.
Q chart for industrial system.
is followed at ¹ sample number 990 by a further rise in the value of the T 2 . Monitoring of the Q chart, displayed in ® gure 6, showed no increase in error until sample number 990, when a large rise in error was detected. This result suggests that at sample number 870 a non-critical artefact, e.g. a grade change or new schedule, enters the system, and that at sample number 990, a further artefact is introduced. These results were con® rmed by the PLS analysis. Investigation of the contributions to the T 2 and Q values showed that a number of tube temperatures were contributing signi® cantly to the scores and residuals during the process artefact at sample number 990. Discussions with process operators con® rmed the results obtained using MSPC. A grade change occurred at sample number 870, and the ® re in the reactor began at ¹ sample number 990. The results obtained from this industrial study support several of the ® ndings presented above. 6.
99% Confidence Limit
Summary and conclusions
The impact of various classes of generic abnormality in the operation of continuous process plants on MSPC monitoring has been investigated. It has been shown how the e ectiveness of these approaches may be understood in terms of model- and signal-based fault detection methods, and how the MSPC methods may be con® gured in order to maximize their e ectiveness. A statistic, which we call the regularized T 2 statistic, has been proposed for use in model-based residuals analysis. Our results indicate its potential to provide sensitive detection of changes in process condition.
The issue of attributing cause to a change in process condition has been investigated. Our results support the view that the contributions to T 2 and Q statistics provide useful information in this regard, and indicate the need for further investigation of their robustness under a variety of real-life process conditions. Acknowledgements The authors gratef ully acknowledge the support of Foxboro Ltd, Predictive Control Ltd, and Control Technology Centre Ltd in the funding of this work. We are gratef ul also to ICI Chemicals Ltd for advice and provision of data from the Per-Tri plant. Appendix A. Principal component analysis and partial least squares PCA is a method used to analyse the covariance of a set of plant variables. The approach transf orms the matrix of variables, [Z], into a matrix of mutually uncorrelated variables, tk . These variables, called principal components (PCs), are transf orms of the original data into a new basis de® ned by a set of orthogonal loading vectors, pk . The individual values of the principal components are called scores. The transf ormation is de® ned by
‰ ZŠ ˆ
np