Generalized Reconstruction-Based Contributions for Output-Relevant ...

12 downloads 0 Views 2MB Size Report
the Tennessee Eastman process is presented to demonstrate the use of the ..... out the detec- tion index of this faulty sample, which is far above the control ...
1114

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 19, NO. 5, SEPTEMBER 2011

Generalized Reconstruction-Based Contributions for Output-Relevant Fault Diagnosis With Application to the Tennessee Eastman Process Gang Li, Carlos F. Alcala, S. Joe Qin, Senior Member, IEEE, and Donghua Zhou, Senior Member, IEEE

Abstract—Multivariate statistical process monitoring technologies, including principal component analysis (PCA) and partial least squares (PLS), have been successfully applied in many industrial processes. However, in practice, many PCA alarms do not lead to quality deterioration due to process control and recycle loops in process flowsheets, which hinders the reliability of PCA-based monitoring methods. Therefore, one is more interested to monitor the variations related to quality data, and detect the faults which affect quality data. Recently, a total projection to latent structures (T-PLS) model has been reported to detect output-relevant faults. In this paper, a generalized reconstruction based contribution (RBC) method with T-PLS model is proposed to diagnose the fault type for output-relevant faults. Furthermore, the geometrical property of generalized RBC is analyzed. A detailed case study on the Tennessee Eastman process is presented to demonstrate the use of the proposed method without or with prior knowledge.

The control limit for each

Fault directions of known candidate set, the actual fault direction. Reconstructed value with value with .

The output-relevant part of matrix or vector . MP pseudoinverse of

to a special

RBC of variables or fault directions to a . special detection Combined index of it is shown as .

and

. In figures,

The control limit of . While output-relevant fault is detected.

, the

The reconstruction limit, which is used to reduce the candidates of actual fault. Manuscript received January 11, 2010; revised June 22, 2010; accepted August 16, 2010. Manuscript received in final form August 23, 2010. Date of publication October 14, 2010; date of current version August 17, 2011. Recommended by Associate Editor Z. Gao. This work was supported in part by National 973 Project under Grant 2010CB731800 and Grant 2009CB320602 and by the NSFC under Grant 60721003, Grant 60736026, Grant 60931160440. G. Li and D. Zhou are with the Department of Automation, TNList, Tsinghua University, Beijing 100084, China (e-mail: [email protected]). C. F. Alcala is with the Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, CA 90089 USA. S. J. Qin is with Departments of Chemical and Electrical Engineering, University of Southern California, Los Angeles, CA 90089 USA (e-mail: sqin@usc. edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCST.2010.2071415

.

I. INTRODUCTION AULT diagnosis and fault-tolerant design have become a significant consideration for complex industrial processes, which attracts more and more attention of engineers and researchers [1]–[4]. In the area of multivariate statistical process monitoring, principal component analysis (PCA) and partial least squares (PLS) have been successfully applied to the detection of abnormal situations in industrial processes [5]–[7]. These methods are used to reduce the dimensionality of the measurement space by projecting measurement data onto a low-dimensional latent space. While PCA models capture variations in input data with a descending order of variance, PLS models capture the variations in input data that are related to output data with a descending order of covariance. Input data consist of process variables such as temperature, pressure, and flow rate, which are collected frequently with negligible time delay. Output data consist of key quality variables such as composition, granularity, and particle size, which are collected infrequently with remarkable delay. If faults in the input data are concerned, one should use PCA to build a correlation-based model and monitor the process. To detect faults that have an impact on output data, which are referred to as output-relevant faults, one should use PLS to monitor the process. Those faults that have no impact on output data are called output-irrelevant faults. Early papers in the process monitoring area have stated clearly when PCA- or PLS-based monitoring methods should be used [6]–[8]. When the output data are unavailable or unreliable, PCA-based monitoring is the only choice. Otherwise, PLS-based monitoring is preferred to detect and diagnose output-relevant faults, although PCA-based monitoring can still be applied.

F

Contribution of variable . detection

, reconstructed

Actual fault magnitude, estimated fault magnitude using , estimated fault magnitude using .

Index Terms—Output-relevant fault diagnosis, reconstructionbased contribution, total projection to latent structures.

NOMENCLATURE

.

1063-6536/$26.00 © 2010 IEEE

LI et al.: GENERALIZED RECONSTRUCTION-BASED CONTRIBUTIONS FOR OUTPUT-RELEVANT FAULT DIAGNOSIS

Statistical process control charts based on PLS have been studied frequently in the past. Kresta et al. laid out the basic methodology to detect faults related to output data in continuous processes with PLS models [5]. MacGregor et al. proposed monitoring methods with multiblock PLS models for fault diagnosis [7]. Li et al. revealed the geometric property of the PLS decomposition structure in input data, and compared different PLS models for the purpose of process monitoring [9]. As reported in the literature, PLS can divide variations of input data into principal part and residual part. A traditional way of using PLS model is to detect output-relevant faults in the principal PLS factor space and detect output-irrelevant faults in the residual space [7]. However, there are two problems with the decomposition of traditional PLS model. One problem is that the principal PLS factor space in the PLS model still contains variations that are uncorrelated to output data. The other problem is that the residual space still contains a large variance, since the input variance is not minimized in PLS. The residual space with a large variance is not appropriate to be monitored with a type index, which is also known as the squared prediction error (SPE). To clearly separate output-relevant faults from output-irrelevant faults, Zhou et al. proposed a total PLS (T-PLS) model for output-relevant process monitoring [10]. The T-PLS model divides input data into four parts, which improves the monitoring performance of output-relevant faults. Once a fault is detected, a diagnostic tool is needed to identify faulty variables. There are several methods for fault diagnosis based on historical data, such as discriminant analysis [11], [12], pattern matching using dissimilarity factors [13], [14], and structured residual-based approaches [15]–[17]. In the statistical process monitoring area, one popular approach is to use contribution plots to identify root cause [7], [18]–[20], which does not need prior fault information. Li et al. proposed contribution plots with T-PLS model for output-relevant fault diagnosis [21]. The assumption behind contribution plots is that faulty variables tend to make higher contributions to a fault detection index. Although it has found wide applications in various processes, the approach of contribution plots suffers from smearing, which can fail to identify the cause correctly even for sensor faults [18], [22]. Dunia and Qin proposed to characterize faults by fault directions or subspaces and used the fault direction information to perform fault identification via reconstruction [23]. A nice feature of this method is that faults with known fault directions can be diagnosed without ambiguity, but it requires knowing the fault direction. To make an improvement, Alcala and Qin proposed a reconstruction-based contribution (RBC) to perform fault diagnosis with PCA models, which inherits the merit of traditional contribution plots and has a solid theoretical foundation for sensor faults [22]. However, Alcala and Qin did not discuss how to make use of historical fault information for fault diagnosis when it is available. In this paper, inspired by the work of Dunia and Qin [23], a generalized RBC method is proposed to extend RBC to the case of known fault direction, which has the ability of using fault information when available. Moreover, this work is focused on output-relevant fault diagnosis, therefore takes T-PLS as the process data model. The remainder of this paper is orga-

1115

nized as follows. T-PLS based fault detection methods are reviewed in Section II. In Section III, a generalized RBC method based on T-PLS models is proposed to diagnose output-relevant faults. Further, the geometrical meaning of generalized RBC is analyzed. In Section IV, a case study on the Tennessee Eastman process (TEP) is used to demonstrate the proposed method in detail. Finally, conclusions are made in the last section. II. T-PLS-BASED FAULT DETECTION A. Total Projection to Latent Structures (T-PLS) process variables Consider the situation where there are measured with samples. These data can be arranged in an . Measurements on quality and proinput matrix ductivity variables with samples can be expressed as output matrix . In practice the output data are more sparse than the input data. To collect the data for modeling, only input data that correspond to the available output data are selected in the X matrix. The objective of this paper is to monitor the variations in input data that are relevant to the output data, therefore a PLS analysis using both and is preferred. To reflect the are correlation information of variables, columns of and usually centered to zero mean and scaled to unit variance. In traditional PLS model, data matrices and are decomposed as follows [7]: (1) where is the score matrix, , are the loading matrices for and , respectively. is the number of PLS factors, which is usually determined by cross validation [7]. Matrices , are the residual of and , reis used to spectively. In the PLS algorithm, a weight matrix calculate score matrix iteratively. Let , . then in PLS includes variations uncorreAs the score matrix lated to and still contains large variance, T-PLS algorithm carries out a further decomposition [10] (2) where , , are score matrices, and , , are the corresponding loading matrices. is the new loading matrix for corresponding to . is the is the new residual matrix after a PCA decomposition of . number of output-relevant components, and is the number of output-irrelevant components contained in . The detailed T-PLS algorithm is given in Appendix I. In the T-PLS model, represents the variations completely related to in the original , represents variations orthogonal to in the original , is the principal scores of the original , and is the residual of that is not excited in the normal process data. , which is referred as to Note that input data are in input space. T-PLS decomposes input space into four subspaces, which corresponds to different parts of the T-PLS model. Table I describes the properties of these subspaces.

1116

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 19, NO. 5, SEPTEMBER 2011

TABLE I MEANING OF DIFFERENT SUBSPACES [10]

simultaneously. However, for simplicity, one may prefer to observe one index rather than two indices for fault detection. Li et al. followed Yue and Qin [24] to propose a combined index for monitoring output-relevant faults as follows, which and in a balanced way [25]: incorporates

TABLE II FAULT DETECTION INDICES AND CONTROL LIMITS

(5) Where 3 = (1=(n 0 1))T T , 3 = (1=(n 0 1))T T , 3 = (1=(n 0 1))T T , S is the sample variance of Q and  is the

where

sample mean of Q , is the significance level of  and F distribution.

T-PLS-based monitoring methods can increase the detection rate and reduce the false alarm rate for output-relevant faults.

From (5), limit for

(6) is a quadratic function of , thus the control can be calculated as follows [26]:

B. Fault Detection Indices

(7)

Traditional PLS-based methods monitor output-relevant variations with a type index and output-irrelevant variations with a type index, respectively [7]. However, in PLS model, still monitors variations unrelated to , and contains a large variability. Thus, T-PLS-based methods are used to monitor output-relevant faults in this paper. As output variables are infrequently measured and are not available after a significant time delay in practice, input samples are considered for online fault detection. Suppose a new input sample is denoted by , then scores and the residual of are calculated as [10]

(3) where (4) In T-PLS-based methods, , , and contain the systematic process variation and are suitable to be monitored using the index. As represents the residual part, it is suitable to use the index. Table II lists calculations of fault detection indices and their control limits, where control limits are obtained from normal process data. and affect Only abnormal situations in subspaces and need to be monitored output data. Therefore,

where

, , and is the estimated covariance of input variables. In this paper, is taken as the detection index for and could be used sioutput-relevant faults, although multaneously for monitoring if preferred. III. GENERALIZED RECONSTRUCTION-BASED CONTRIBUTION FOR FAULT DIAGNOSIS A. Contribution Plots and RBC Once a fault is detected, a diagnosis tool is required to determine the fault type. Contribution plot is very popular for process diagnosis, which does not require specific fault information. For a T-PLS model, Li et al. proposed contribution plots, considand in a unified form [21]. ering contributions to Alcala and Qin summarized various contribution formulations in the literature and unified them under three categories, complete decomposition contributions, partial decomposition contributions, and RBC [20], [27]. Here, the framework of complete decomposition contributions is followed. Contributions (Cont) to a fault detection index can be expressed as (8)

Index

(9) where is a fault detection index such as represents a specific matrix defined by the index,

or

,

LI et al.: GENERALIZED RECONSTRUCTION-BASED CONTRIBUTIONS FOR OUTPUT-RELEVANT FAULT DIAGNOSIS

1117

so that (12) is The objective of reconstruction is to find minimized. By solving the unconstrained optimal problem, the analytical solution can be obtained [24], [25]

TABLE III

0 FOR VARIOUS DETECTION INDEX

(13)

denotes the contribution of variable to , is the row of and is the column of the identity matrix . Table III lists the matrix for different indices in the context of the T-PLS-based monitoring. The method of contribution plots assumes variables with largest contributions to a fault detection index are most likely faulty variables. However, this assumption does not have a solid theoretical basis and may cause misleading diagnosis result in some cases. Even for a sensor fault, traditional contribution plots may give a misleading result due to the smearing effect [12]. To overcome the problem, Alcala and Qin proposed a RBC method to diagnose faulty variables [22]. The method combines contribution analysis and reconstruction based identification together, which gives improved diagnosis results. The RBC for variable is defined as

where denotes the Moore-Penrose (MP) pseudoinverse of , . a matrix, In order to determine effect of the reconstruction along different fault directions, the generalized RBC is defined as follows, by substituting solution (13) into (12). to index is deDefinition 1: Generalized RBC of each fined as follows:

(14) a property of the where MP pseudoinverse. Suppose the actual fault direction in (11) is from a known fault set, and denoted by . Then, the following Theorem can be used for identifying the fault type from the known fault set. has full column rank, the with Theorem 1: If have the following two properties. represent the reconstruction limit, 1) Let then (15)

(10) 2) If a faulty sample has the form where all parameters are the same as in (9). It can be seen that RBC and Cont are different in calculation with the same information. B. Generalized RBC Plots With T-PLS Models Although RBC has a better performance than traditional contribution, it does not make use of fault direction information when it is available. In this section, a generalized RBC method is proposed to extend RBC to the case of known fault directions, which enables the RBC method to work with or without prior knowledge of faults. To focus on the output-relevant faults, the proposed method is based on the T-PLS model. When a fault occurs, the faulty sample is represented as (11) where represents the input sample value under abrepresents the fault-free sample normal situation, is an orthonormal matrix that spans the fault value, subspace with a dimension and denotes the fault magnitude. Collect all known fault directions into a candidate , where is the number of known faults. faults set Given an arbitrary fault direction from the fault set, the faulty sample can be corrected by . Thus, the reconstructed detection index can be calculated by (12)

,

for all other

(16)

The proof of Theorem 1 is given in Appendix II. There are some remarks on generalized RBC. Remark 1: Similar to RBC, generalized RBC can be caland . culated for other fault detection indices, such as The usage and properties of generalized RBC to these indices are similar. In this paper, is taken to detect and diagnose output-relevant faults. Remark 2: The first property in Theorem 1 does not guarantee unique identification of the fault type, but it helps to search the actual fault in a reduced candidate set instead of the whole set. Remark 3: The second property in Theorem 1 describes an important relation between RBC of the actual fault direction and RBCs of other fault directions. The result applies to the case can be omitted when the fault magnitude is large, so that . compared to Remark 4: When there is no fault, RBCs for all directions are uneven. Thus, it is necessary to derive a control limit for each RBC. Control limits for contributions are studied by many researchers [18], [19], [22]. Here, the control limits are determined according to [26] (17) where

, ,

.

1118

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 19, NO. 5, SEPTEMBER 2011

Fig. 1. Relation between reconstructed data and fault direction.

In a process with variables, if a fault happens in variable , the fault direction is . RBC for each variable, called variable RBC, can be obtained as (18) means the diagonal element of . The result is where consistent with (10), when is chosen as fault detection index. RBCs with known process fault directions, are called fault-specific RBCs. Remark 5: When no prior fault directions are available, variable RBCs are used only. There are two considerations for selecting the significant variables responsible for the faults. First, the variable RBCs for a detection index should be larger than its corresponding control limit. However, because of the smearing effect, variables that are normal can have larger contributions than their control limits. Thus, as the second and preferred approach, the variables with relatively larger contributions are more likely significant contributors. It depends on the expert to decide how many contributing variables are significant. C. Geometric Interpretation of RBC When

has full column rank, reduces to . Let , noting and using (13), the following relation can be obtained:

(19) This property indicates that scribed in Fig. 1, which means of onto the subspace measured by

is orthogonal to , as deis the orthogonal projection . Further, the angle can be

(20) Result (20) means RBC measures the angle between the outputrelevant part of faulty sample and the subspace . The fault that has the highest RBC also has the smallest angle. It is . also clear from (20) that all RBCs are not more than D. Summary of the Generalized RBC Method To sum up, the whole procedure of generalized RBC diagnosis method with T-PLS model is given as follows. 1) Obtain normal process data and quality data , center each column of and to zero mean and scale them into

unit variance, build a T-PLS model with proper parameters according to Appendix I. 2) Extract the output relevant part of the fault direction for known output-relevant faults by performing SVD on , where means the output-relevant part of historical faulty data [25], [28]. An example is shown in the following case study. 3) Perform online fault detection using in (5) and the control limit in (7). 4) When an output-relevant fault is detected, compare variin (15). able RBCs and fault-specific RBCs with i) If one or more RBCs are above , the actual fault is identified as one of these fault directions. If the reconstructed samples with a direction are all within control, the fault direction is the actual fault. , focus on large RBCs ii) If none of the RBCs exceed which exceed their control limits for contributing variables or fault directions. The fault must be a new fault and the faulty data should be used to extract its fault direction for further use. IV. CASE STUDY ON THE TENNESSEE EASTMAN PROCESS In this section, a detailed case study on the Tennessee Eastman process (TEP) is carried out to show the usage and effectiveness of the proposed diagnosis methods. A. Process Description TEP was created by the Eastman Chemical Company to provide a realistic industrial process for evaluating process control and monitoring methods [29]. The process consists of five major units: a reactor, condenser, compressor, separator and stripper; and it contains eight components: A, B, C, D, E, F, G, and H. The gaseous reactants A, C, D, and E and the inert B are fed to the reactor where the liquid products G and H are formed. The species F is a by-product in the process. The detailed description of the process is shown in Fig. 2. The process used here is operated under closed-loop control. TEP has been widely used as a benchmark process for evaluating process diagnosis methods such as PCA, support vector machine, and fisher discriminant analysis (FDA). PLS based methods have also been applied to TEP [30], [31]. Chiang et al. reviewed fault detection and diagnosis methods of multivariate statistics process monitoring, such as PCA, FDA, PLS, and canonical variate analysis (CVA), and compared them using the case study of TEP [32]. TEP contains two blocks of variables: the XMV block of 12 manipulated variables and the XMEAS block of 41 measured variables. Process measurements are sampled with interval of 3 min. Nineteen composition measurements are sampled with time delays that vary from six minutes to fifteen minutes. This time delay has a potentially critical impact on product quality control within the plant [30]. It implies that the fault effect on product quality can not be detected using output data until the next sample of is available. During this time, the products are produced with lower quality. PLS-based monitoring methods can detect the fault using process input data that are correlated to , thus it is able to detect output-relevant faults before the output data Y is measured.

LI et al.: GENERALIZED RECONSTRUCTION-BASED CONTRIBUTIONS FOR OUTPUT-RELEVANT FAULT DIAGNOSIS

1119

Fig. 2. Process flowsheet for the Tennessee Eastman process.

B. T-PLS Modeling and Fault Direction Extraction of Output-Relevant Faults In this study, the composition of G in stream 9, i.e., MEAS(35), is chosen as the output variable with a time delay of 6 min. Twenty-two process measurements and eleven manipulated variables, i.e., MEAS1 through MEAS22 and MV1 through MV11, are chosen as input data . The detailed information of and can refer to Table IV. First, 480 normal sample pairs of and are centered to zero mean and scaled to unit variance, which are used to built a T-PLS model with , , and . There are 15 types of known faults in TEP, which are represented as IDV (1)–(15). However, only eight of them can be seen as output-relevant faults, which are IDV 1, 2, 5, 6, 8, 10, 12, and 13 [10]. Figs. 3 and 4 show output and its prediction error under the normal case and different faulty cases, which clearly indicates these faults affect significantly. The faults in Fig. 3 start from the first sample. Table V provides simple descriptions for these faults. Based on the prebuilt T-PLS model from normal historical data, these faults can be detected sensitively. Fig. 5 shows the fault detection results for the output-relevant faults using 480 training samples. When prior faulty data are available for a known type of fault, fault directions should be extracted from faulty samples first. According to [25] and [28], the reduced

, are exfault direction matrix tracted from 480 faulty samples for each fault. By performing SVD on (21) where diagonal matrix contains singular values in descending order. The reduced fault direction consists of the first columns of , where is determined as the minimum dimension which brings reconstructed under control limit. As is an orthogonal matrix, has full column rank. It is shown in Fig. 6 that the majority of reconstructed are under the control limit using fault directions with proper dimensions. In the practice, TEP may be affected by several kinds of faults simultaneously, which needs to perform fault isolation. However, it is quite difficult to consider multiple faults in this work. Therefore, only one fault is supposed to happen for each test data set in this section. C. Fault Diagnosis Without Prior Knowledge of Faults After an output-relevant fault is detected, one is concerned with which variables are related to this fault significantly. Contribution plots or reconstruction based contributions can provide a primary decision on faulty variables without prior knowledge of faults, thus they are used and compared in this section. IDV

1120

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 19, NO. 5, SEPTEMBER 2011

TABLE IV PROCESS VARIABLES AND QUALITY VARIABLE

Fig. 3. Delayed quality measurements y for normal and eight output-relevant fault cases.

(6) (step) and IDV (8) (random) are taken as examples to show the usage of variable RBCs. For each fault in this paper, testing

data consist of 960 samples and the abnormal situation is introduced to the process at the 160th sample.

LI et al.: GENERALIZED RECONSTRUCTION-BASED CONTRIBUTIONS FOR OUTPUT-RELEVANT FAULT DIAGNOSIS

1121

Fig. 4. Prediction error of y for normal and eight output-relevant fault cases by T-PLS models.

TABLE V FAULT DETECTION RATE AND FAULT DIRECTION EXTRACTION

IDV (6) involves a step loss in A feed in Stream 1. When this fault occurs, there is a sudden loss of measurement of A feed, then A feed flow is increased due to feedback control. As the material balance is broken, many process variables are affected significantly by the fault, including output (see Fig. 3).

Fig. 7 shows the detection result of this fault. As variable RBCs and traditional contributions are similar among different samples, the 200th sample is taken as an example, which is plotted in Fig. 8. Notice that the arrow in Fig. 7 points out the detection index of this faulty sample, which is far above the control

1122

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 19, NO. 5, SEPTEMBER 2011

Fig. 5. Fault detection index ' for normal and eight output-relevant fault cases. The control limit is plotted as a solid horizon line.

limit. As in Fig. 7 is too large, making the control limit hard to observe, a semi-log plot is used. From Fig. 8, it is observed that variables 1 and 25 have more contributions to than others, which indicates A feed measurement and A feed flow (manipulated variable), respectively. In this fault case, contribution plots identify the same faulty variables as variable RBCs do, and the fault is effectively diagnosed. IDV (8) induces a random fault in the A, B, C feed compositions of stream 4, which affects process variable 4 directly. Other variables are influenced by feedback control, including quality and manipulated variable 26. Given a test data of 960 faulty samples, the fault is first detected at the 178th point as shown in Fig. 9. The RBCs and contribution plots of sample 200 are plotted in Fig. 10. From the result of RBC, it is observed that four groups of variables contributed to , which are variables 3 5, 9 11, 18 22, and 26 28. From the result of traditional contribution plots, variables 11, 18, 19, 20, 22, and 27 contribute most to the fault detection index, which are a subset of variables that RBC gives. However, the identified variables do not contain variable 4 as the contributing variable. It can be seen that RBC is more effective than traditional contribution plots. D. Fault Diagnosis With Known Fault Directions If there is prior knowledge of faults, the fault-specific RBC can be used to diagnose the fault conclusively. Note that it is still necessary to include variable RBCs to analyze the faulty

variables associated with the given fault. In this subsection, IDV (1) (step) and IDV(10) (random) are used to show the usage of variable and fault-specific RBCs jointly. When IDV (1) is introduced to the testing data at the 160th sample, a step change is induced in the A/C feed ratio in Stream 4, which causes an increase in the C feed and a decrease in the A feed in Stream 4. This leads to a decrease in the A feed in the recycle stream 5 and a control loop reacts to increase the A feed in Stream 1. These two effects counteract each other over time. The fault is detected at the 164th point as shown in Fig. 11. All generalized RBCs, including variable and fault-specific RBCs, are plotted for sample 200 in Fig. 12. It can be found that variable groups 3 5, 9 11, 18 22, and 26 28 are mostly related to this fault, which is similar to IDV(8). This phenomenon can be explained from the descriptions of IDV (1) and IDV (8), which are both related to the A, B, and C feeds. The fault-specific RBCs are shown as RBC of 34 41 in Fig. 12. Results indicate that all variable RBCs do not exceed the reconstruction limit, while five fault-specific RBCs do, which can be seen as candidates of the actual fault. The reconstructed for all samples including the 200th sample with fault direction are shown in Fig. 13. It is clear that only brings the reconstructed back into normal region for all samples. This indicates that IDV(1) is the actual fault, which also shows that using control limits for contributions are not reliable. The results in Fig. 13 is analogous to the work by Valle et al. [28] for PCA-based monitoring.

LI et al.: GENERALIZED RECONSTRUCTION-BASED CONTRIBUTIONS FOR OUTPUT-RELEVANT FAULT DIAGNOSIS

1123

Fig. 6. Reconstructed index ' after extracting fault directions of selected dimensions. The control limit for normal samples are represented by a horizon solid line.

Fig. 7. Fault detection results for IDV (6).

When IDV (10) occurs, a random change happens in the temperature of C feed, which is compensated by feedback control. In Fig. 14, the fault is detected at the 185th sample. Fig. 15 depicts the variable and fault-specific RBCs together, and indicates several potential fault types, including (sensor fault direction) and (process fault directions). The reconstructed indices with these directions are plotted in Fig. 16, which indicates that IDV(10) and IDV(1) should be considered as the most likely fault types. This result means IDV (10) and IDV (1) are indistinguishable in terms of the output impact.

Fig. 8. Variable RBCs and contribution plots of the 200th sample for IDV (6).

Using the indices that are complementary to could further distinguish IDV(10) from IDV (1), which is beyond the scope of this paper. V. CONCLUSION In this paper, a generalized RBC method with T-PLS models is proposed for output-relevant fault diagnosis, which extends variable RBCs to fault-specific RBCs. Further study shows the geometric meaning of the generalized RBC method. The proposed method makes use of known fault directions for more decisive fault diagnosis results.

1124

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 19, NO. 5, SEPTEMBER 2011

Fig. 9. Fault detection results for IDV (8).

Fig. 12. Variable RBCs and fault-specific RBCs of the 200th sample for IDV (1).

of input space that is relevant to output data. The diagnosability can be enhanced by using the indices complementary to additionally. Note that it is not advisable to use control limits to determine the ultimate fault type directly. APPENDIX I A. T-PLS Algorithm [10]

Fig. 10. Variable RBCs and contribution plots of the 200th sample for IDV (8).

Center and scale the raw data to give and . 1) Perform the nonlinear iterative partial least squares (NIand , and train a PLS PALS) algorithm on data pair model as shown in (1), see the following algorithm for reference. with components, 2) Run PCA on , where rank . 3) Let , where . . Run PCA on with 4) components. 5) Run PCA on with components, , is determined using PCA methods, where such as the cumulative percent variance criterion. B. NIPALS Algorithm [7]

Fig. 11. Fault detection results for IDV (1).

The case study on TEP is performed to show the usage of generalized RBC in detail. When there is no prior knowledge of faults, variable RBCs are used to determine the faulty variables. With prior knowledge of faults, the combination of fault-specific RBCs and variable RBCs can improve the diagnosis performance. However, the diagnose results also reflect the limitation of using for fault diagnosis, as measures a subspace

Set and . The PLS component number is determined by cross-validation [7]. to any column of . 1) Set . 2) . 3) . 4) . If converges, go to Step vi, else return to 5) Step ii. . 6) . Set and return to step i. 7) . Terminate if , , . Let Then model (1) is obtained. See [8] for reference of PCA model.

LI et al.: GENERALIZED RECONSTRUCTION-BASED CONTRIBUTIONS FOR OUTPUT-RELEVANT FAULT DIAGNOSIS

1125

Fig. 13. Reconstructed ' with fault directions of five possible fault types.

Fig. 15. Variable RBCs and fault-specific RBCs at the 200th sample for IDV (10).

Fig. 14. Fault detection results for IDV (10).

where is an idempotent and symmetrical matrix due to a property of the MP pseudoinverse, means the actual fault magnitude, means the estimated fault magnitude with actual . The reconstructed fault direction. Thus, can be calculated by

APPENDIX II A. Proof of Theorem 1 When

has full column rank,

. Note that

(22)

(23)

1126

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 19, NO. 5, SEPTEMBER 2011

Fig. 16. Reconstructed ' with fault directions of seven possible fault types.

where trol. As

means fault-free value, which should be within con(24)

On the other hand, if

, it can be obtained

(25) is an idempotent and symmetrical matrix due where to a property of the MP pseudoinverse. REFERENCES [1] Q. Chen and U. Kruger, “Analysis of extended partial least squares for monitoring large-scale processes,” IEEE Trans. Control Syst. Technol., vol. 13, no. 5, pp. 807–813, Sep. 2005. [2] Z. Gao, T. Breikin, and H. Wang, “High-gain estimator and fault-tolerant design with application to a gas turbine dynamic system,” IEEE Trans. Control Syst. Technol., vol. 15, no. 4, pp. 740–753, Jul. 2007. [3] X. Wang, U. Kruger, G. Irwin, G. McCullough, and N. McDowell, “Nonlinear PCA with the local approach for diesel engine fault detection and diagnosis,” IEEE Trans. Control Syst. Technol., vol. 16, no. 1, pp. 122–129, Jan. 2008.

[4] Z. Gao, X. Dai, T. Breikin, and H. Wang, “Novel parameter identification by using a high-gain observer with application to a gas turbine engine,” IEEE Trans. Ind. Inform., vol. 4, no. 4, pp. 271–279, Jul./Aug. 2008. [5] J. V. Kresta, J. F. Macgregor, and T. E. Marlin, “Multivariate statistical monitoring of process operating performance,” Canadian J. Chem. Eng., vol. 69, no. 1, pp. 35–47, 1991. [6] B. M. Wise and N. B. Gallagher, “The process chemometrics approach to process monitoring and fault detection,” J. Process Control, vol. 6, pp. 329–348, 1996. [7] J. F. MacGregor, C. Jaeckle, C. Kiparissides, and M. Koutoudi, “Process monitoring and diagnosis by multiblock PLS methods,” AIChE J., vol. 40, no. 5, pp. 826–838, 1994. [8] S. J. Qin, “Statistical process monitoring: Basics and beyond,” J. Chemometrics, vol. 17, no. 8–9, pp. 480–502, 2003. [9] G. Li, S. J. Qin, and D. H. Zhou, “Geometric properties of partial least squares for process monitoring,” Automatica, vol. 46, no. 1, pp. 204–210, 2010. [10] D. Zhou, G. Li, and S. J. Qin, “Total projection to latent structures for process monitoring,” AIChE J., vol. 56, no. 1, pp. 168–178, 2010. [11] A. Raich and A. Cinar, “Statistical process monitoring and disturbance diagnosis in multivariable continuous processes,” AIChE J., vol. 42, no. 4, pp. 995–1009, 1996. [12] S. Yoon and J. MacGregor, “Fault diagnosis with multivariate statistical models part I: Using steady state fault signatures,” J. Process Control, vol. 11, no. 4, pp. 387–400, 2001. [13] M. Kano, S. Hasebe, I. Hashimoto, and H. Ohno, “Statistical process monitoring based on dissimilarity of process data,” AIChE J., vol. 48, no. 6, pp. 1231–1240, 2002. [14] M. Kano, K. Nagao, S. Hasebe, I. Hashimoto, H. Ohno, R. Strauss, and B. Bakshi, “Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem,” Comput. Chem. Eng., vol. 26, no. 2, pp. 161–174, 2002.

LI et al.: GENERALIZED RECONSTRUCTION-BASED CONTRIBUTIONS FOR OUTPUT-RELEVANT FAULT DIAGNOSIS

[15] S. Qin and W. Li, “Detection, identification, and reconstruction of faulty sensors with maximized sensitivity,” AIChE J., vol. 45, no. 9, pp. 1963–1976, 1999. [16] S. J. Qin and W. Li, “Detection and identification of faulty sensors in dynamic processes,” AIChE J., vol. 47, no. 7, pp. 1581–1593, 2001. [17] J. Gertler, W. Li, Y. Huang, and T. McAvoy, “Isolation enhanced principal component analysis,” AIChE J., vol. 45, no. 2, pp. 323–334, 1999. [18] J. A. Westerhuis, S. P. Gurden, and A. K. Smilde, “Generalized contribution plots in multivariate statistical process monitoring,” Chemometrics Intell. Lab. Syst., vol. 51, no. 1, pp. 95–114, 2000. [19] S. W. Choi and I. B. Lee, “Multiblock PLS-based localized process diagnosis,” J. Process Control, vol. 15, no. 3, pp. 295–306, 2005. [20] C. Alcala and S. J. Qin, “Unified analysis of diagnosis methods for process monitoring,” in Proc. IFAC Safeprocess, 2009, pp. 1007–1012. [21] G. Li, D. H. Zhou, Y. D. Ji, and S. J. Qin, “Total PLS based contribution plots for fault diagnosis,” Acta Automatica Sinica, vol. 35, no. 6, pp. 759–765, 2009. [22] C. Alcala and S. Qin, “Reconstruction-based contribution for process monitoring,” Automatica, vol. 45, no. 7, pp. 1593–1600, 2009. [23] R. Dunia and S. J. Qin, “Subspace approach to multidimensional fault identification and reconstruction,” AIChE J., vol. 44, no. 8, pp. 1813–1831, 1998. [24] H. Yue and S. Qin, “Reconstruction-based fault identification using a combined index,” Ind. Eng. Chem. Res., vol. 40, no. 20, pp. 4403–4414, 2001. [25] G. Li, D. H. Zhou, and S. J. Qin, “Output relevant fault reconstruction and fault subspace extraction in total projection to latent structures models,” Ind. Eng. Chem. Res., 2010, doi: 10.1021/ie901939n, accepted for publication. [26] G. E. P. Box, “Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification,” Annals Math. Stat., vol. 25, no. 2, pp. 290–302, 1954. [27] C. Alcala and S. J. Qin, “Analysis and generalization of fault diagnosis methods for process monitoring,” J. Process Control, 2010, submitted for publication. [28] S. Valle, S. Qin, M. Piovoso, B. M. , and M. N. , “Extracting fault subspaces for fault identification of a polyesterfilm process,” in Proc. ACC, 2001, pp. 466–4471. [29] J. J. Downs and E. F. Vogel, “A plant-wide industrial process control problem,” Comput. Chem. Eng., vol. 17, no. 3, pp. 245–255, 1993. [30] D. J. H. Wilson and G. W. Irwin, “PLS modelling and fault detection on the Tennessee Eastman benchmark,” in Proc. Amer. Control Conf., 1999, pp. 3975–3979. [31] G. Lee, C. H. Han, and E. S. Yoon, “Multiple-fault diagnosis of the Tennessee Eastman process based on system decomposition and dynamic PLS,” Ind. Eng. Chem. Res., vol. 43, no. 25, pp. 8037–8048, 2004. [32] L. H. Chiang, E. L. Russell, and R. D. Braatz, “Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis,” Chemometrics Intell. Lab. Syst., vol. 50, no. 2, pp. 243–252, 2000.

Gang Li received the B.S. degree from the Department of Precision Instruments and Manufacturing, Tsinghua University, Beijing, China, in 2004, where he is currently pursuing the Ph.D. degree from the Department of Automation, Tsinghua University, Beijing, China. His research interest covers statistical process monitoring, dynamic process modeling, data-driven fault diagnosis and prognosis.

1127

Carlos F. Alcala received the B.S. degree in chemical engineering from the Technological Institute of Ciudad Madero, Mexico, in 2004. In 2005, he was awarded a Fulbright Fellowship to pursue an M.S. degree in Chemical Engineering in the University of Texas at Austin. He is currently pursuing the Ph.D. degree in chemical engineering from the University of Southern California, Los Angeles. His research interests include the fault detection and diagnosis of linear and nonlinear processes with multivariate statistical methods. During his graduate studies, he has performed internships at Capstone Technology, Seattle, WA, North Microelectronics, Beijing, China, and the Dow Chemical Company, Freeport, Texas. Mr. Alcala was a recipient of a Roberto Rocca Fellowship.

S. Joe Qin (SM’02) is the Fluor Professor with the Viterbi School of Engineering, University of Southern California, Los Angeles. He received the B.S. and M.S. degrees in automatic control from Tsinghua University, Beijing, China, in 1984 and 1987, respectively, and the Ph.D. degree in chemical engineering from University of Maryland, College Park, in 1992. His research interests include statistical process monitoring and fault diagnosis, model predictive control, system identification, run-to-run control, and control performance monitoring. Dr. Qin was a recipient of the National Science Foundation CAREER Award, NSF-China Outstanding Young Investigator Award, and an IFAC Best Paper Prize for Control Engineering Practice. He is currently an Associate Editor for the Journal of Process Control and the IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, and Member of the Editorial Board for Journal of Chemometrics. He served as Editor for Control Engineering Practice and Associate Editor for IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY.

Donghua Zhou (SM’02) received the B.Eng., M. Sci., and Ph.D. degrees from the Department of Electrical Engineering, Shanghai Jiaotong University, Shanghai, China, in 1985, 1988, and 1990, respectively. He was an Alexander von Humboldt Research Fellow (1995–1996) with the University of Duisburg, Germany, and a Visiting Scholar with Yale University (July 2001–Jan. 2002). He is currently a Professor with the Department of Automation, Tsinghua University, Beijing, China. He has published over 70 international journal papers and 3 monographs in the areas of process identification, diagnosis, and control. Dr. Zhou serves the profession in many capacities such as IFAC Technical Committee on Fault Diagnosis and Safety of Technical Processes, Associate Editor of Journal of Process Control, Deputy General Secretary of Chinese Association of Automation (CAA), and Council member of CAA. He was also the NOC Chair of the 6th IFAC Symposium on SAFEPROCESS, 2006.