Journal of Process Control 54 (2017) 47–64
Contents lists available at ScienceDirect
Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont
Fault detection using multiscale PCA-based moving window GLRT M. Ziyan Sheriff a,b , Majdi Mansouri c , M. Nazmul Karim a , Hazem Nounou c , Mohamed Nounou b,∗ a b c
Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77840, United States Chemical Engineering Program, Texas A&M University at Qatar, Doha, Qatar Electrical and Computer Engineering Program, Texas A&M University at Qatar, Doha, Qatar
a r t i c l e
i n f o
Article history: Received 26 June 2016 Received in revised form 16 January 2017 Accepted 10 March 2017 Available online 22 March 2017 Keywords: Multiscale principal component analysis Generalized likelihood ratio test Moving window Tennessee Eastman Process Fault detection
a b s t r a c t The presence of measurement errors (noise) in the data and mode l uncertainties degrade the performance quality of fault detection (FD) techniques. Therefore, an objective of this paper is to enhance the quality of FD by suppressing the effect of these errors using wavelet-based multiscale representation of data, which is a powerful feature extraction tool. Multiscale representation of data has been used to improve the FD abilities of principal component analysis. Thus, combining the advantages of multiscale representation with those of hypothesis testing should provide further improvements in FD. To do that, a moving window generalized likelihood ratio test (MW-GLRT) method based on multiscale principal component analysis (MSPCA) is proposed for FD. The dynamical multiscale representation is proposed to extract the deterministic features and decorrelate autocorrelated measurements. An extension of the popular hypothesis testing GLRT method is applied on the residuals from the MSPCA model, in order to further enhance the fault detection performance. In the proposed MW-GLRT method, the detection statistic equals the norm of the residuals in that window, which is equivalent to applying a mean filter on the squares of the residuals. This means that a proper moving window length needs to be selected, which is similar to estimating the mean filter length in data filtering. The fault detection performance of the MSPCA-based MW-GLRT chart is illustrated through two examples, one using synthetic data, and the other using simulated Tennessee Eastman Process (TEP) data. The results demonstrate the effectiveness of the MSPCA-based MW-GLRT method over the conventional PCAbased and MSPCA-based GLRT methods, and both of them provide better performance results when compared with the conventional PCA and MSPCA methods, through their respective charts T2 and Q charts. © 2017 Elsevier Ltd. All rights reserved.
1. Introduction Process monitoring is an integral part of chemical processes, required to maintain product quality and ensure safe operation of plants. Process monitoring is carried out mainly in two steps, fault detection and diagnosis. In this paper, we will focus on the fault detection task. Fault detection techniques can be classified using various methodologies. A popular method of classification is followed by the authors in [1] where FD models are classified into three different categories: (i) quantitative model-based methods, (ii) qualitative models and search strategies and (iii) process data-based methods [2–4]. Principal Component Analysis (PCA) lies
∗ Corresponding author. E-mail addresses:
[email protected],
[email protected] (M. Nounou). http://dx.doi.org/10.1016/j.jprocont.2017.03.004 0959-1524/© 2017 Elsevier Ltd. All rights reserved.
under the third category because it uses historical process data to obtain a statistical (PCA) model. Data-based monitoring methods, especially those that utilize PCA or its extensions, have been widely used in many applications in various industries, e.g., air quality monitoring [5], chemical industry [6], water treatment [7], ecology [8], pharmacology [9], biology and biotechnology [10], agriculture [11], health [12], semi-conductors [13], rolling bearings [14,15], and many others. PCA provides a linear combinations of a set of variables to summarize most of the data information in a smaller number of variables. In mathematical terms, PCA is based on the orthogonal decomposition of the covariance matrix of the process variables along the directions that give the maximum variations of the data. The transformed variables obtained from PCA models can be used to detect the presence of faults in the data [1,5,16–19]. The main detection indices used with PCA are the Hotelling statistic, T2 , and the sum of squared residuals, SPE, or Q statistic. The T2 statistic is a measure of the variations captured by the PCA model, and the
48
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
Q statistic is a measure of the amount of variations not captured by the PCA model, i.e., variations in the residual space. However, process data collected from most chemical and refinery processes may contain high levels of noise, autocorrelation, and might deviate from normality, and this might adversely affect the performance of the conventional PCA monitoring chart in terms of modeling and fault detection. To address these issues, Bakshi [20] has developed a multiscale wavelet-based multivariate process monitoring technique through Multiscale Principal Component Analysis (MSPCA), in which, the MSPCA computes the PCA of the wavelet coefficients at each scale. Due to its multiscale nature, MSPCA has been shown to be more appropriate for modeling of data containing contributions from events whose behavior changes over time and frequency [21,22]. The Generalized Likelihood Ratio Test (GLRT) chart has recently been incorporated with multivariate monitoring charts in order to improve the fault detection performance of many multivariate process monitoring charts [23,24]. In this work, we show that the performance of the GLRT chart can be further improved through its implementation in a moving window of lagged residuals. In our previous work [23], we have successfully applied generalized likelihood ratio test (GLRT) for model-based fault detection. GLRT is a popular composite hypothesis testing method known to provide better fault detection performance compared to conventional T2 and Q statistics. In the moving-window GLRT (MW-GLRT) method, the detection statistic equals the norm of the residuals in that window, which is equivalent to applying a mean filter on the squares of the residuals [25,26]. Therefore, this paper aims to extend the MSPCA technique through the utilization of the MW-GLRT chart in order to improve fault detection performance. This is accomplished by using MSPCA to model the available process data, and then applying the MW-GLRT chart on the residuals obtained from MSPCA model in order to detect faults. The fault detection performance of the MSPCA-based MW-GLRT method is assessed and compared to existing techniques through two examples using simulated synthetic data, and the Tennessee Eastman Process (TEP). The rest of the paper is organized as follows. In Section 2, an introduction to PCA is provided along with descriptions of the T2 and Q fault detection indices, followed by an introduction to multiscale wavelet-based data representation and a description of the multiscale PCA algorithm. Section 3 introduces statistical hypothesis testing and generalized likelihood ratio charts. Then, in Section 4, the proposed algorithm which integrates the fault detection abilities of the MSPCA model with the MW-GLRT chart is presented, along with detailed discussions on the algorithm. Section 5 aims to assess the performance of the developed MSPCA-based MW-GLRT control chart through two examples, using simulated synthetic data, and the Tennessee Eastman Process data. Conclusions are then presented in Section 6.
2. Principal component analysis (PCA) and multiscale principal component analysis methods 2.1. Principal component analysis (PCA) PCA is a popular linear dimensionality reduction technique, which is particularly useful to deal with data that contain a high degree of correlation between variables [27]. Many processes contain large amounts of data, and dimensionality reduction is particularly advantageous for fault detection purposes as it reduces the computational time and storage space that may be required [26,28]. Considering the following data matrix, X ∈ Rn×m , of m process
T
n×m
T variables and n observations, X = x1T , x2T , ..., xm ∈ R , where xi ∈ Rm . Before building the PCA model, the data is usually pre-
Fig. 1. Schematic illustration of PCA model.
processed. Each process variable in the data matrix is scaled to zero mean and unit variance. This is required because different process variables are measured with various standard deviation and means using different units. After data preprocessing, using single value decomposition (SVD), the data matrix can be expressed as follows [23]: X = TP T ,
(1)
where, T = [t1 , t2 , ..., tm ] ∈ Rn×m is a matrix containing the transformed variables, ti ∈ Rn , and these are called the principal components or score vectors, and P = [p1 , p2 , ..., pm ] ∈ Rm×m is a matrix containing the orthogonal vectors pi ∈ Rm , also called the loading vectors, which are the eigenvectors associated with the covariance matrix of the data matrix X. The covariance matrix (˙) is defined as follows [29]: ˙=
1 X T X = PP T with PP T = P T P = Im , n−1
(2)
where is a diagonal matrix containing the eigenvalues related to the m PCs, (1 > 2 > · · · > m ), and = diag (1 , 2 , ..., m ) Im is the identity matrix [29]. It is important to note that the PCA model utilizes the same number of principal components as the number of original process variables (m) used. However, when process variables that are highly correlated are available, it is possible to use a smaller number of principal components (l) in order to capture most of the variations in the data [27]. The quality of the PCA model depends mainly on how many principal components are retained. Underestimating the number of principal components to retain may degrade the prediction ability of the PCA model, while overestimating the number of principal components could introduce noise that may mask important features in the data [24,30]. Hence, selecting the number of principal components is important, and there are several techniques that have been developed for this purpose. These techniques include Cumulative Percent Variance (CPV) [24], cross validation [31], the Scree plot and profile likelihood [32] to name a few. CPV is often used as it is relatively easy to compute, and provides a good estimate of the number of principal components to retain for most practical applications. CPV can be computed using the following equation [24]:
l CPV (l) =
i=1 i
trace (˙)
× 100.
(3)
The smallest number of principal components is chosen-based on the number of principal components that capture a certain percentage of the total variance (e.g., 95%). After determining the
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
49
Fig. 2. Schematic diagram for multiscale representation of data.
number of principal components (l) to retain, the data matrix can be expressed as follows [24]:
X=TP= Tˆ T˜
T Pˆ P˜
,
(4)
where, Tˆ ∈ Rn×l and T˜ ∈ Rn×(m−l) , are the matrices that contain the l retained principal components and the ignored (m−l) principal components, respectively. Similarly, the Pˆ ∈ Rm×l and P˜ ∈ Rm×(m−l) are the matrices that contain the l retained eigenvectors and the ignored (m−l) eigenvectors. Expanding Eq. (4) leads to the following expression [24]: Xˆ
E
T
X=Tˆ Pˆ T +T˜ P˜ T =X Pˆ Pˆ +X Im − Pˆ Pˆ T
(5)
where, Xˆ represents the modeled variation of X computed using the first l components, while matrix E represents the variations that correspond to the process noise. The PCA model is illustrated in Fig. 1 [24]. Once the PCA model is constructed, fault detection can be carried out using different fault detection indices. PCA utilizes two fault detection indices in order to carry out process monitoring: the Hotelling statistic (T2 ), and the sum of squared residuals (i.e. the Squared Predication Error (SPE), or Q statistic) [23]. The T2 statistic captures the variation in the PCA model, while Q-statistic examines the percentage variance not covered by the PCA model (i.e. the residual space produced by the PCA model). A subset of the principal components that can extract most of the important features in the data, corresponding to the largest eigenvalues, makes its analysis simpler [24]. 2.2. Multiscale principal component analysis (MSPCA) 2.2.1. Multiscale wavelet-based data representation Wavelet-based multiscale representation is a powerful dataanalysis tool that provides efficient separation of deterministic and stochastic features [33,34]. Given a time domain data set (signal), a coarser approximation of the signal (called a scaled signal) can be obtained by convoluting the original signal with a low pass filter (h) (Fig. 2), which is derived from a scaling basis function of the following form [35]: ij (t) =
2−j 2−j t − k ,
(6)
where, k and j are discretized translation and dilation parameters, respectively. The detail signal, which is the difference between the
Fig. 3. Schematic illustration of MSPCA model.
original and the approximated signals, can be obtained by convoluting the original signal with a high pass filter (g) (Fig. 2), which is derived from a wavelet basis function of the following form [35]:
ij
(t) =
2−j
2−j t − k
(7)
After repeating these approximations, the original signal can be represented as the sum of the last scaled signal and all detail signals, i.e. [35]: −J
x (t) =
n2 k=1
−J
aJk Jk +
J n2
djk
jk
(t) ,
(8)
j=1 k=1
where, n and J represent the length of the signal and the maximum possible decomposition depth, respectively. Fig. 2 shows a schematic diagram that illustrates this multiscale representation process.
2.2.2. MSPCA method description The MSPCA model was developed by Bakshi [20] with the aim of combining the ability of PCA to extract cross-correlation between variables with the ability of orthonormal wavelets to separate feature from noise, and approximately decorrelate autocorrelation between available measurements [20,36,37]. Fig. 3 illustrates the MSPCA algorithm that was developed by Bakshi [20], and its key steps are highlighted in Algorithm 1.
50
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
Algorithm 1: MSPCA algorithm Input: n × m data matrix X, Confidence interval
α
.
1. For each column (i.e. process variable) in the data matrix compute the wavelet decomposition; 2. For each block (matrix) of scaled and detail coefficients at each scale, the covariance matrix is computed along with the number of principal components, as well as PCA loadings and scores of those wavelet coe fficients; 3. Once the appropriate number of loadings is selected, wavelet coefficients larger than a certain threshold are selected; 4. For all scales together, PCA is carried out after including only scales with significant events during reconstruction.
3. Statistical hypothesis testing Statistical hypothesis is the methodology that uses statistics to determine if a given hypothesis is true or false. In terms of fault detection, statistical hypothesis testing can be used to determine if a particular process observation is under normal (fault-free) state observing the null hypothesis, or if the observation is faulty observing the alternate hypothesis [25,38].
The normalized residual E¯ is distributed as:
¯ E∼N , 2 In ,
where, = 0 under the null hypothesis (9). Then, the scaled test statistic is distributed as the non-central chi-square distribution as follows,
3.1. GLRT statistic The GLRT is a hypothesis testing technique that has been implemented and utilized for model-based fault detection purposes, and seeks to maximize the detection probability for a given false alarm vector formed by one rate [39–41]. Let y ∈ Rn is an observation
of the two Gaussian distributions: N 0, 2 In or N = / 0, 2 In , where is the mean vector (which is the value of the fault) and 2 > 0 is the variance that is assumed to be known in this problem. The hypothesis test can be formulated as follows [24]: H0 = H1 =
y∼N 0, 2 In
y∼N , 2 In
, (null hypothesis) , (9) , (alternative hypothesis) .
With the GLRT method, the unknown parameter, , is replaced by its maximum likelihood estimate. The GLRT decision function T(y) is expressed as follows [24]: T (y) = 2 log =
1 min 2
sup f ∈ RN
(y)
f=0 (y)
= 2 log
ˆ 2 + y2 y − 2 2 arg min y
=
sup exp
1 2
y
2
−
y − 2 2 2
2 ˆ − 2
+ y22
/
−
1 = 2
y22
2 2
2
y2
= y is the maximum likelihood estiwhere ˆ = mate of , the probability density function of Y is f (y) = 1
, and .2 represents the Euclidean norm. N
(11)
T (y) =
1 2
y22 ∼2n ,
(12)
with n degrees of freedom. Since GLRT is applied online, and the norm is computed using only the current observation, and the GLRT statistic will follow chi-square distribution with a degree of freedom of one, 21 [23]. Although the GLRT chart does show improved performance it is important to note that the statistic is computed using only the current observation. Literature has shown [25] through many charts that the utilization of a technique with increased process many does further improve the fault detection performance, thus motivating the extension of the GLRT chart to one that incorporates a moving window, and the formulation of this statistic is described next.
3.2. Moving window GLRT (MW-GLRT) ,(10)
The moving-window GLRT statistic can be computed as follows:
MW − GLRT = 2 log
f (Y ) , f=0 (Y )
(13)
− 22
2 (2) 2 N exp y−2
Since GLRT uses the ratio of the distributions of fault-free and faulty data, for the case of non-Gaussian variables, non-Gaussian distributions would need to be used. For the derivations shown above, maximizing the likelihood function is equivalent to maximizing its natural logarithm, because the logarithmic function is monotonic [24]. To select the threshold for the GLRT statistic its distribution need to be determined. Since the noise is assumed to follow a Gaussian distribution, the test statistic leads to a chi-square distribution [42].
where, Y = [y (i − (w − 1)) · · ·· · ·y (i − 1) y (i)] and i and w are the observation number and the length of moving window, respectively. Like the GLRT chart, in order to establish the threshold for the MW-GLRT chart, the distribution that the derived MW-GLRT statistic follows needs to be determined. The moment generating function can be used to derive this distribution. The conventional GLRT statistic follows a chi-square distribution with one degree of freedom and its moment generating function is of the following form: n
Myi (t) = (1 − 2t)− 2 ,
(14)
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
where, r represents the degree of freedom of a given chi-square distribution. The moment generating function for the MW-GLRT statistic can then be computed as follows [25]. MYi (t) =
w
i=1
Myi (t)
n1 n2 nw − − − = (1 − 2t) 2 · (1 − 2t) 2 · · · · · · · · (1 − 2t) 2 n1 + n2 + · · · + nw 2 = (1 − 2t)
51
Therefore, this work focuses on extending MSPCA, and developing a MSPCA-based MW-GLRT technique in order to improve the fault detection performance. The proposed MSPCA-based MWGLRT algorithm is presented in the next section.
4. MSPCA-based MW-GLRT fault detection algorithm (15)
−
w − = (1 − 2t) 2 . The derived expression shows that the MW-GLRT statistic follows a chi-square distribution with degree of freedom equal to the window length (WL), w [25].
The idea behind a MSPCA-based MW-GLRT fault detection algorithm is to incorporate the advantages brought forward by MSPCA model with the MW-GLRT fault detection chart. This can be accomplished through the fault detection algorithm illustrated in Fig. 4. The algorithm which highlights the developed MSPCA-based MW-GLRT fault detection technique is presented in Algorithm 2. The MSPCA-based MW-GLRT algorithm is used to detect the faults in the residual vector obtained from the MSPCA model, by applying ¯ the MW-GLRT chart on each residual vector, E.
Algorithm 2: MSPCA-based MW-GLRT fault detection algorithm. Input: n × m data matrix X, Confidence interval
α
.
Training Data 1. For each column (i.e. process variable) in the data matrix compute the wavelet decomposition; 2. Compute the mean and standard deviation of all process variables, and standardize the data matrix; 3. Each variable is decomposed into wavelet coefficients; a. A matrix of wavelet coefficients at each scale is formed; b. PCA is carried out on each of these scales; c. Using either the T2 or Q, threshold limits for each of the scales are computed in order to threshold and retain wavelet coefficients at each scale; 4. The data matrix is reconstructed using retained wavelet scales and coefficients; 5. PCA is carried out on the reconstructed training data matrix in order to obtain an approximate data matrix and residuals; Testing Data 1. The data matrix is standardized using the mean and standard deviation of the fault -free variables computed in the training data set; 2. Each variable is decomposed into wavelet coefficients;
In the MSPCA model, applying the PCA model at every scale after wavelet decomposition is a crucial step. It is possible to prefilter data using multiscale representation, and then carry out fault
52
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
Fig. 4. Schematic illustration of proposed MSPCA-based MW-GLRT algorithm.
detection using the PCA model [43]. This means that the filtering and fault detection tasks are implemented as two independent tasks in the fault detection algorithm. However, implementing the filtering and fault detection tasks simultaneously does show improved detection results [44]. MSPCA achieves this by applying PCA on every scale after wavelet decomposition, and once again applying PCA after reconstructing the data back to the time domain. The wavelet coefficients inherently satisfy the assumption that the data are independent and Gaussian, and therefore improves the detection ability of the PCA model at multiple scales. Using the univariate Shewhart chart as an example, the authors in [44] demonstrate the advantage of applying the filtering and fault detection tasks simultaneously in a single algorithm. The multiscale Shewhart chart is able to provide lower missed detection and false alarm rates, when compared to the Shewhart chart pre-treated with multiscale filtering, as well as the conventional Shewhart chart. Furthermore, using the multivariate PCA chart as an example, Bakshi demonstrates that the MSPCA algorithm, provides lower ARL1 values when compared to the conventional PCA method [20]. In addition to the PCA chart, a number of fault detection techniques assume that data collected from sensors are Gaussian, independent and only contain a moderate level of noise. As previously stated, multiscale wavelet-based representation can help address these issues, as wavelet coefficients inherently possess the ability to denoise signals, decorrelate autocorrelated signals, and force data to be Gaussian at multiple scales, and has shown improved performance when utilized. Analysis using multiscale wavelet-based data representation to address violations in these assumptions and provide improved fault detection performance, compared to their conventional techniques can be found in literature [44–46]. The results in [45] conclude that multiscale methods are able to provide significantly lower missed detection rates, along with comparable false alarm rates and ARL1 values than their conventional forms, which encourages their implementation, especially under violation of the stated assumptions. The next two sub-sections examine two important aspects of the MSPCA-based MW-GLRT technique, which are the selection of the length of the moving window, and the performance of the MWGLRT chart when dealing with autocorrelated data. 4.1. Selection of moving window length Moving window techniques are often used to smoothen noisy data by filtering, and this is done through either mean or expo-
nentially weighted filters [47]. Simple moving window techniques have been used for numerous applications including forecasting and GPS fault detection [48,49]. When utilizing the moving average charts, a main assumption is that the data being filtered is Gaussian. When this assumption is violated, computation of the fault detection limits can be tricky and most solutions can be computationally expensive, e.g., the authors in [49] utilize a Markov chain approach in order to compute the limits. The threshold for the MSPCA-based MW-GLRT technique proposed in this paper is relatively straightforward to compute, only requiring the length of the moving window to be known. However, selecting the appropriate window length to utilize when applying MSPCA-based MW-GLRT is important. When applying a window-based GLRT technique, published literature only evaluates performance using one fault detection metric, the out-ofcontrol average run length (ARL1 ) [25]. The claim is that increasing the window length indefinitely improves the fault detection performance (as seen through a decrease in the ARL1 values). This decrease can be attributed to the window-based GLRT gaining process memory when compared to the conventional GLRT technique. Although, ARL1 improves, and quicker detection is possible when using a window-based GLRT technique, this might not necessarily be true for the missed detection and false alarm rates. Therefore, it is essential to evaluate the fault detection performance of the MWGLRT technique using all three available performance detection metrics: missed detection rate, false alarm rate and ARL1 . As will be demonstrated in Section 5, through illustrative examples, as the window length increases the ARL1 is known to decrease, but a longer window leads to an increase in the false alarm rate, and may in certain cases lead to a decrease in the missed detection rate as well. Although, quicker detection may be the most important performance criterion for many industrial applications, there may be a few applications where the false alarm or missed detection rates are equally vital. Therefore, the choice of the window length needs to be selected based on the given application, keeping all three criteria and their tolerance levels in mind. This paper utilizes the following procedure to decide what window length to use when applying the MSPCA-based MW-GLRT technique. Most industrial process models have commonly occurring faults, or faults that are extremely severe and these cases will be required when selecting of window size parameter as illustrated in the Algorithm 3.
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
53
Algorithm 3: Selection of window length algorithm. Training the model – Part I 1. Use training data collected under normal operating conditions to obtain process residuals using the MSPCA model. 2. Apply a range of window sizes to these process residuals and ensure that the in-control average run length, ARL0, for all window sizes are constant. This is accomplished by selecting the threshold limit (e.g., 99%) from a chi-square distribution, with the degree of freedom equivalent to the window length being utilized. Testing the model – Part II 1. Utilize available data for the most commonly occurring faults, and the most severe faults, and extract process residuals for these cases using the MSPCA model. 2. Apply the same range of window sizes to these process residuals that were used during the training phase. 3. Make a comparison between the fault detection performances obtained when using different window sizes, and select a window size that is able to provide suitable performance results keeping all three performance detection criteria in mind. A compromise is required in most cases, depending on the application and the degree of tolerance with respect to a detection index that the process engineer is willing to endure. A particular window size is considered acceptable if it does not exceed the maximum missed detection rate, false alarm rate, and ARL1 value specified by the process engineer. 4. Apply the optimum and acceptable window sizes for testing purposes to detect faults.
4.2. Assessing the impact of autocorrelated residuals on the MW-GLRT technique One fundamental assumption in the derivation of the MW-GLRT chart is that there is independence between the lagged process residuals obtained from the MSPCA model. This assumption comes from the original GLRT derivation, where residuals are assumed to be independent. However, in practice, especially for dynamic processes, PCA and MSPCA may not be able to fully extract the dynamic component. This can lead to significant autocorrelation in the process residuals. Therefore, it is important to examine how the performance of the MW-GLRT is affected under violation of the independence assumption, and a performance comparison between independent and autocorrelated process residuals is required. In order to carry out this assessment 1024 independent residuals are obtained using a standard normal distribution, and 1024 autocorrelated process residuals are obtained using the following autoregressive AR(1) model [50]: xt = axt−1 + ε,
(16)
where, a is the autoregressive coefficient that can have values ranging from 0 to 1 depending on required degree of autocorrelation, while ε is white noise that follows a standard normal distribution with zero mean and unit variance. Both sets of process residuals (independent and autocorrelated) are split into training and testing data sets of 512 observations each, in order to carry out fault detection. A step fault of magnitude +2 is then added to the testing process residuals, between observations 251:350, for both sets of residuals (independent and autocorrelated). The MW-GLRT chart is then utilized to test its fault detection performance under both conditions. A window size of 4 was utilized as it provides reasonable detection for this particular fault size. In order to provide a thor-
ough assessment of the effect of autocorrelation, the simulation is repeated for the entire range of autocorrelation (a = 0.01–0.99). A Monte-Carlo simulation of 5000 realizations is carried out for each degree of autocorrelation, in order to ensure that the results obtained are accurate and for meaningful conclusions to be drawn. The results of the simulation for this performance comparison are illustrated in Fig. 5. As Fig. 5 demonstrates the fault detection metric that sees the highest change due to the assumption violation is the missed detection rate, with a difference of nearly 25%, from an initial value of 9% (at low autocorrelation) to a value of 34% (at high autocorrelation) (see Fig. 5(a)). However, the false alarm rate and ARL1 values remain relatively constant for a much wider range of autoregressive coefficient as see in Fig. 5(b) and (c), respectively. From these results it can be concluded that the fault detection performance of the MW-GLRT does get affected when the process residuals violate the assumption of independence. However, for lower to moderate autocorrelation values the performance is comparable to the case when the process residuals are independent. It is important to note that extremely high autocorrelation values do not occur frequently in practice, and that the MW-GLRT is therefore applicable for a wide range of autocorrelation values. It is important to assess the performance of the proposed technique for fault detection purposes to encourage its application in practice. Therefore, in Section 5, the performance of the developed MSPCA-based MW-GLRT technique is assessed, and compared to the conventional PCA and MSPCA methods, and other PCA-based methods through two examples, one using synthetic data and the other using simulated TEP data.
54
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
Fig. 5. Impact of autocorrelated residuals on the MW-GLRT technique.
5. Illustrative examples
where, e¯ (0) = 0 and ˇ is a diagonal matrix defined as follows [51]:
In this section, the fault detection performance of the developed MSPCA-based MW-GLRT algorithm will be evaluated using three fault detection criteria: the missed detection (MD) rate, the false alarm (FA) rate, and the out-of-control average run length (ARL1 ) [38]. The missed detection rate is computed by calculating the percentage of observations that go undetected in the faulty region, while the false alarm rate is the computed by calculating the percentage of incorrect faulty declarations in the non-faulty region. ARL1 is the number of observations it takes for a particular technique to detect a fault in the faulty region after it has been introduced, i.e., speed of detection. In order to thoroughly assess the performance of the developed technique a comparison will also be made to two other existing PCA-based fault detection techniques: the filtered SPE statistic and the Di index [51]. The authors who developed the filtered SPE and Di index state that the methods are able to detect smaller faults, and were therefore chosen to compare and evaluate the performance of the developed MSPCA-based MW-GLRT technique [51]. The filtered SPE statistic is derived by applying an exponentially weighted moving filter on the residuals from the PCA model. The filtered SPE statistic is computed as follows [51]:
ˇ = I,
e¯ (k) = I − ˇ e¯ (k − 1) + ˇe (k) ,
(17)
(18)
where, 0 < < 1 is a forgetting factor. Qin defines the limits for the filtered SPE chart as follows [52]: ı¯ 2˛ =
ı2 , 2− ˛
(19)
where, ı2˛ is the non-filtered control limit. Meanwhile, the Di index can be computed using the last principal component as follows [51]:
Di (k) =
m
t˜j2 , i = 1, 2, ..., (m − 1).
(20)
j=m−i+1
The process is considered out of control when [51]: 2 Di (k) > i,˛ ,
(21)
and the control limits can be computed using the 2 distribution with h(i) degrees of freedom as follows [51]: 2 i,˛ = g(i) 2h
(i) ,˛
,
(22)
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
55
Fig. 6. Monitoring a fault of magnitude unity using PCA and MSPCA-based T2 charts.
Fig. 7. Monitoring a fault of magnitude unity using PCA and MSPCA-based Q charts.
where parameters g(i) and h(i) are computed using the following equations [51]: m
g(i) =
j=m−i+1 m
m
2j , h(i) = j
j=m−i+1
2j
j=m−i+1 m
X (t) = X˜ (t) + 0.2N (0, 1) .
(23)
2j
j=m−i+1
The following sub sections compare the performance of the conventional PCA and MSPCA-based T2 , Q, and GLRT charts, with the conventional PCA-based filtered SPE statistic and Di index, as well as the developed MSPCA-based MW-GLRT charts. 5.1. Simulated synthetic data The simulated synthetic example replicates and extends the illustrative example carried out in the original MSPCA paper [20]. Two variables are generated using Gaussian measurements that are uncorrelated, of zero mean and unit variance. The final variables are generated by adding and subtracting the first two variables, respectively, as shown in the following equations [20]: x˜ 1 (t) = N (0, 1) x˜ 2 (t) = N (0, 1) x˜ 3 (t) = x˜ 1 (t) + x˜ 2 (t) x˜ 4 (t) = x˜ 1 (t) − x˜ 2 (t)
,
The measured data matrix, X˜ (of four variables), is then contaminated by white noise, that is uncorrelated Gaussian error, of zero mean and standard deviation of 0.2 as follows [20]:
(24)
(25)
Normal operating condition consisted of 2048 equally spaced observations. Abnormal operation (also of 2048 observations) consists of a step change in the mean in all four variables between samples between observations 1001 and 1300 (faulty region is highlighted in light blue in all charts). The performance of MSPCA-based MW-GLRT chart is illustrated, and compared to the conventional PCA-based and MSPCA-based methods for two different cases. 5.1.1. Case 1 This case considers a mean shift of magnitude unity, which is large enough for reasonable detection by PCA and MSPCA. The fault detection results of PCA and MSPCA methods are shown in Figs. 6 and 7, respectively. As shown in Fig. 6(a), the PCA-based T2 chart is unable to detect most of the fault (99%), while the MSPCA-based T2 chart (see Fig. 6(b)) has a lower missed detection rate (77%). The same trend can be shown using PCA-based and MSPCA-based Q charts as well (see Fig. 7). The MSPCA-based Q chart provides a missed detection rate of 3% while the PCA-based Q chart has a missed detection rate of 78%. Fig. 8 shows the PCA-based filtered SPE and Di index charts. The filtered SPE chart (Fig. 8(a)) is better able to detect more of the fault than the Di index chart (Fig. 8(b)). The filtered SPE chart is
56
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
Fig. 8. Monitoring a fault of magnitude unity using PCA-based filtered SPE and Di index charts.
Fig. 9. Monitoring a fault of magnitude unity using PCA and MSPCA-based GLRT charts.
Fig. 10. Monitoring a fault of magnitude unity using MSPCA-based MW-GLRT charts.
also able to provide similar detection to that of the MSPCA-based Q chart (Fig. 7(b)), but does return a higher false alarm rate. Fig. 9 shows the fault detection results using the GLRT chart applied using the PCA and MSPCA-based methods, and both charts show similar missed detection rates of approximately 35%. Fig. 10 shows the developed MSPCA-based MW-GLRT charts for Case 1. Using Algorithm 3, detailed in Section 4.1, the optimal window length was found to be 50. However, a second window length of 4 was
included in the analysis to demonstrate the difference in performance obtained when different window lengths are utilized. We can see that both MSPCA-based MW-GLRT charts (Fig. 10(a) and (b)) are able to produce better detection that all other charts as they are both able to detection almost 100% of the fault. It should also be noted that the false alarm rate of the MSPCA-based MW-GLRT charts are lower than that obtained from the PCA-based filtered SPE chart.
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
57
Fig. 11. Monitoring a fault of magnitude 0.3 using PCA and MSPCA-based T2 charts.
Fig. 12. Monitoring a fault of magnitude 0.3 using PCA and MSPCA-based Q charts.
5.1.2. Case 2 This case considers a mean shift of magnitude 0.3. This fault is only 30% of standard deviation of all variables and most known conventional techniques fail to detect this fault. The fault detection performance of PCA and MSPCA methods are shown in Figs. 11 and 12, respectively. Both PCA-based T2 and Q charts fail to detect most of the fault (as seen in Figs. 11(a) and 12(a)). The MSPCA-based Q chart (Fig. 12(b)) does show slightly improved performance over the conventional PCA-based Q chart (Fig. 12(b)). However, the PCA-based filtered SPE and Di index charts (Fig. 13), are unable to perform well, as most of the fault goes undetected, since the fault size is too small for efficient detection. The filtered SPE chart (Fig. 13(a)) does perform slightly better than the Di index chart (Fig. 13(b)). Fig. 14, that compares the GLRT chart, applied using the PCAbased and MSPCA-based methods, show similar missed detection rates to each other (approximately 93%). However, the performance is further improved through the application of the MSPCA-based MW-GLRT charts (see Fig. 15). The MW-GLRT with window size of 50 (Fig. 15(b)) is able to achieve a missed detection of approximately 27% which is relatively low when compared to all other charts. In order to ensure that meaningful results can be made a MonteCarlo simulation of 5000 realizations was carried out. The fault detection results are summarized in Table 1. These results show
that the MSPCA-based MW-GLRT chart is able to significantly reduce both the missed detection rate and ARL1 values. The results in Table 1 are further summarized into bar plots of all three fault detection metrics, in Figs. 16–18, for the missed detection rate, false alarm rate and ARL1 , respectively. These results show that although the lowest missed detection rates and ARL1 values are provided by the developed MW-GLRT technique, it does lead to a slightly elevated false alarm rate as shown in Fig. 15. This can be attributed to two reasons: Gibb’s phenomenon (which is described as the production of artificial artifacts near discontinuities during reconstruction), and because all observations in a moving window are given equal weightage [25]. However, it can be argued that the reduction in missed detection rate and ARL1 , outweighs the slight increase in the false alarm rate, and therefore its implementation should therefore be pursued. It should be also be noted that the MW-GLRT chart does perform better than the other PCA-based charts (filtered SPE and Di index), with lower false alarm rates (see Fig. 17). Next, the developed MSPCA-based MW-GLRT fault detection algorithm is illustrated through its application on the Tennessee Eastman Process. 5.2. Tennessee Eastman Process To further validate the developed MSPCA-based MW-GLRT fault detection methods it would need to be tested on process data from
58
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
Fig. 13. Monitoring a fault of magnitude 0.3 using PCA-based filtered SPE and Di index charts.
Fig. 14. Monitoring a fault of magnitude 0.3 using PCA and MSPCA-based GLRT charts.
Fig. 15. Monitoring a fault of magnitude 0.3 using MSPCA-based MW-GLRT charts.
a plant. The TEP is a well-defined simulation of a chemical process that has been commonly used in process monitoring research [6,53–56]. The process consists of five main units: a reactor, a condenser, compressor, a stripper and a vapor/liquid separator. Details of process including the reactions are widely detailed in literature [53,54] (Fig. 19).
Process data for training and testing purposes are available [26]. There are a number of faults that can be tested and these are listed in Table 2. For each of the given testing data sets, the fault was introduced at observation 224, and the fault continued until the end of the data set (the faulty region is highlighted in light blue in the fig-
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
59
Table 1 Summary of missed detection rates, false alarm rates, and ARL1 for simulated data. Case
1
Chart/Fault Detection Metric
Missed Detection Rate (%)
False Alarm Rate (%)
ARL1
2 Missed Detection Rate (%)
False Alarm Rate (%)
ARL1
Conventional PCA-based T2 MSPCA-based T2 Conventional PCA-based Q MSPCA-based Q Conventional PCA-based Filtered SPE Conventional PCA-based Di Index Conventional PCA-based GLRT MSPCA-based GLRT MSPCA-based MW-GLRT (WL = 4) MSPCA-based MW-GLRT (WL = 50)
99.0549 77.7263 78.2243 3.0294 0.1702 46.8615 34.9970 35.3993 0.7717 0.8453
0.1339 0.2618 0.0686 0.8027 10.1617 7.2726 2.7785 2.6869 4.0934 7.0707
88.7818 50.9399 4.5884 1.1560 1.5040 3.4034 1.5455 1.5596 1.0090 1.0000
99.7477 99.6412 97.3838 89.0601 43.9713 87.3175 93.0288 93.0627 84.6234 27.0466
0.1889 0.3169 0.7939 1.4075 10.0161 7.2730 2.7793 2.7767 3.9759 5.4873
131.3441 138.3850 37.3126 50.8664 4.9254 14.2550 14.2550 14.3336 13.8600 4.7986
Table 2 Summary of process faults for Tennessee Eastman Process. Fault Number
Process Variable
Type
IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)
A/C feed ratio, B composition constant B composition, A/C ratio constant D feed temperature Reactor cooling water inlet temperature Condenser cooling water inlet temperature A feed loss C header pressure loss-reduced availability A, B, and C feed composition D feed temperature C feed temperature Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reactor cooling water valve Condenser cooling water valve Unknown Unknown Unknown Unknown Unknown The valve fixed at steady state position
Step Step Step Step Step Step Step Random variation Random variation Random variation Random variation Random variation Slow drift Sticking Sticking Unknown Unknown Unknown Unknown Unknown Constant position
Fig. 16. Summary of missed detection rates.
Fig. 17. Summary of false alarm rates.
ures illustrating the simulation). The missed detection rates (%), false alarm rates (%), and ARL1 values obtained for all faults are summarized in Tables 3–5, respectively. The results obtained using TEP data are consistent with those obtained from the example that utilize simulated synthetic data. It should be noted that the PCAbased filtered SPE and Di index chart returns higher false alarm rate
compared to the other charts for all faults, which can be explain by the PCA model not being able to model the dynamic TEP data efficiently. On the other hand, the MSPCA-Based MW-GLRT technique is able to provide significantly lower missed detection rates (see Table 3) and ARL1 values (see Table 5) than most of the other techniques for a number of the faults. However, the MSPCA-based
60
Table 3 Summary of missed detection rates for TEP data. Conventional PCA-based T2
MSPCA-based T2
Conventional PCA-based Q
MSPCA-based Q
Conventional PCA-base Filtered SPE
Conventional PCA-based Di Index
Conventional PCA-based GLRT
MSPCA-based GLRT
MSPCA-based MW-GLRT (WL = 4)
MSPCA-based MW-GLRT (WL = 8)
IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)
1.0000 1.6250 93.7500 92.7500 77.7500 0.0000 76.1250 15.8750 94.0000 85.8750 57.5000 32.2500 14.2500 0.6250 94.1250 88.0000 17.5000 8.2500 79.3750 79.5000 50.1250
42.2500 1.7500 80.5000 83.7500 61.7500 0.0000 40.1250 0.2500 79.7500 32.2500 54.2500 0.5000 5.0000 0.0000 63.0000 41.1250 4.1250 2.1250 72.6250 36.7500 86.5000
0.1250 2.7500 90.3700 93.1250 78.8750 0.0000 74.1250 22.5000 93.2500 52.6250 89.8750 19.3750 6.2500 9.5000 89.0000 64.7850 14.0000 7.8750 88.7500 53.7500 36.5000
0.0000 0.0000 18.1200 24.1250 21.7500 0.0000 10.0000 2.0000 16.7500 2.0000 40.6250 0.3750 2.0000 12.0000 29.0000 2.8750 0.8750 0.0000 34.5000 2.2500 7.7500
0.1250 0.3750 52.8750 69.3750 48.7850 0.0000 34.5000 2.2125 60.3750 25.1250 56.5000 0.8750 3.8750 14.0000 56.0000 25.7500 0.7500 6.1250 58.5000 11.2500 12.3750
6.2500 0.5000 11.2500 13.5000 12.000 0.0000 10.1250 2.2500 14.1250 7.0000 16.2500 3.6250 1.0000 1.0000 13.1250 8.3750 1.8750 2.3750 11.5000 11.2500 3.3750
0.2500 3.7500 98.3750 99.3750 80.6250 0.1250 75.0000 16.2500 99.2500 25.1250 97.8750 10.2500 5.6250 12.5000 98.5000 68.5000 16.1250 9.6250 91.8750 37.3750 44.3750
1.2500 9.0000 99.2500 99.8750 79.7500 0.5000 64.3750 14.0000 98.7500 89.6250 98.3750 9.8750 4.8750 14.7500 98.6250 74.2500 16.3750 10.1250 96.5000 41.8750 76.8750
0.0000 2.6250 88.3750 91.3750 70.7500 0.1250 54.6250 5.3750 94.2500 32.1250 92.1250 1.5000 4.1250 0.0000 80.6250 48.5000 5.0000 8.7500 90.5000 24.8750 33.5000
0.0000 0.5000 80.7500 87.3750 67.0000 0.0000 45.2500 2.3750 90.2500 26.000 87.7500 0.2500 3.5000 0.0000 73.5000 36.8750 1.5000 6.7500 85.0000 19.0000 27.0000
Table 4 Summary of false alarm rates for TEP data. Fault Number
Conventional PCA-based T2
MSPCA-based T2
Conventional PCA-based Q
MSPCA-based Q
Conventional PCA-based Filtered SPE
Conventional PCA-based Di Index
Conventional PCA-based GLRT
MSPCA-based GLRT
MSPCA-based MW-GLRT (WL = 4)
MSPCA-based MW-GLRT (WL = 8)
IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)
0.0000 1.3393 3.5714 1.7857 0.0000 63.3929 0.4464 0.0000 5.8036 0.4464 2.2321 0.0000 0.0000 0.0000 2.6786 0.8929 0.8929 15.6250 0.4464 0.4464 4.4643
53.1250 37.5000 37.5000 9.8214 40.1786 77.6786 46.4286 12.5000 48.2143 20.5357 19.6429 10.7143 2.2321 0.0000 15.1786 43.7500 14.2857 25.8929 2.6786 16.0714 24.1071
1.3393 0.8929 5.3571 3.5714 1.3393 27.3210 0.0000 0.4464 4.9107 0.0000 2.2321 0.0000 0.0000 0.4464 3.5714 4.4643 3.1250 9.3750 0.8929 0.4464 14.7321
19.6429 34.3750 66.5179 66.5179 53.1250 54.9107 50.8929 29.4643 64.2857 40.6250 39.2857 0.0000 2.6786 16.9643 41.0714 60.7143 33.0357 49.5536 50.4464 25.8929 74.5536
21.8750 11.6071 47.7679 22.7679 22.7679 13.8393 34.1429 29.9107 38.3929 15.1786 20.0893 24.1071 18.3036 22.3214 24.5536 31.6964 62.0536 29.4643 29.0179 16.9643 49.1071
79.4643 79.0179 83.4821 79.4643 79.4643 78.1250 83.0357 85.7143 81.6964 80.8036 82.5893 80.3571 78.5714 79.0179 80.8036 84.8214 84.3750 83.9286 79.0179 82.1429 87.0536
1.3393 0.0000 0.4464 0.8929 0.8929 0.0000 0.0000 0.8929 0.8929 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.4464 1.3393 0.4464 0.0000 0.4464 4.0179
0.4464 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.4464 4.0179 0.0000 0.0000 0.0000 0.0000 0.0000 0.4464 0.0000 0.4464 0.0000 0.0000 0.0000 0.4464
1.3393 0.0000 0.0000 0.0000 0.4464 0.0000 1.3393 2.6786 18.7500 0.8929 0.0000 0.0000 0.0000 0.4464 0.0000 11.1607 0.8929 0.0000 0.0000 0.0000 7.1429
2.2321 0.0000 0.0000 0.4464 1.7857 0.8929 6.2500 7.1429 21.8750 3.1250 1.3393 1.7857 0.0000 2.2321 0.0000 21.8750 0.8929 0.0000 0.0000 0.0000 14.2857
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
Fault Number
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
61
Table 5 Summary of ARL1 for TEP data. Fault Number
Conventional PCA-based T2
MSPCA-based T2
Conventional PCA-based Q
MSPCAbased Q
Conventional PCA-based Filtered SPE
Conventional PCA-based Di Index
Conventional PCA-based GLRT
MSPCA-based GLRT
MSPCA-based MW-GLRT (WL = 4)
MSPCA-based MW-GLRT (WL = 8)
IDV(1) IDV(2) IDV(3) IDV(4) IDV(5) IDV(6) IDV(7) IDV(8) IDV(9) IDV(10) IDV(11) IDV(12) IDV(13) IDV(14) IDV(15) IDV(16) IDV(17) IDV(18) IDV(19) IDV(20) IDV(21)
9 13 32 1 1 1 1 29 1 8 7 3 50 1 78 24 2 1 11 68 21
1 5 29 1 1 1 1 1 1 15 9 5 33 1 25 1 17 3 1 9 31
2 6 24 1 1 1 1 19 3 29 22 4 39 3 67 19 20 5 13 79 41
1 1 1 1 1 1 1 1 1 1 1 1 17 3 1 1 1 1 1 1 1
3 4 1 1 1 1 1 1 1 1 1 2 1 2 1 5 1 3 5 1 8
2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1
3 23 90 317 1 1 1 18 165 25 41 4 39 3 573 19 20 78 12 71 243
3 16 90 161 3 5 1 19 1 36 74 4 37 3 69 19 20 80 69 72 242
1 9 24 59 1 2 1 16 1 8 189 1 34 1 89 16 17 13 10 67 232
1 5 20 55 1 1 1 13 1 5 186 1 29 1 76 2 13 9 7 5 29
noted that the PCA-based filtered SPE and Di index charts (Fig. 22) are able to detect most of the fault, but unfortunately return higher false alarm rates when compared to the MSPCA-based MW-GLRT chart. The fault detection results of TE process monitoring show that the developed MSPCA-based MW-GLRT provides good results compared to the conventional methods in terms of missed detection rates and ARL1 values. This is because, in the MW-GLRT method, the detection chart equals the norm of the residuals in that window, and the GLRT chart gains process memory, which improves detection. However, this also means that a proper moving window length needs to be selected, which is similar to estimating the mean filter length in data filtering, and our procedure to select the window length was highlighted in Algorithm 3, in Section 4.1. It should be noted that both applications required different window lengths to operate effectively, justifying the need to select the window length depending on the application. 6. Conclusion Fig. 18. Summary of ARL1 values.
MW-GLRT technique does show a slightly elevated false alarm rate for a few of the faults (see Table 4), and this can again be attributed to Gibbs phenomenon and also due to the fact that the moving window assigns equal weightage to all observations in the window. Finally, it should be noted that the MSPCA-based MW-GLRT technique is able to provide significantly lower false alarm rates, when compared to MSPCA-based T2 and Q charts (see Table 4). These results are further illustrated in Figs. 20–24 for TEP fault 12. The MSPCA-based T2 and Q charts (Figs. 20(b) and 21(b)) outperform, the conventional PCA-based T2 and Q charts (Figs. 20(b) and 21(a)), with lower missed detection rates and ARL1 values. The MSPCA-based MW-GLRT charts (Fig. 24) provide further reduction in missed detection rates and ARL1 values compared to the conventional PCA and MSPCA-based method through their T2 and Q charts, as well as GLRT charts. Utilizing Algorithm 3, from Section 4.1, the optimal window length for monitoring was found to be 8. However, a second window length (of 4) was included in the performance analysis to demonstrate the difference in performance obtained when using different window lengths. It should also be
In this paper a moving-window generalized likelihood ratio test (MW-GLRT) based on multiscale principal component analysis (MSPCA) is proposed for fault detection. This was accomplished by using the MSPCA method to develop a model, and then detecting faults by applying the MW-GLRT charts on the residuals obtained from the model. The performance of the developed technique was assessed, and compared to conventional PCA-based and MSPCA-based techniques using two illustrative examples, simulated synthetic data, and the Tennessee Eastman Process. In order to provide a more comprehensive analysis two other PCA-based methods: the filtered SPE chart, and the Di index charts were also included. The results demonstrate the effectiveness of the MSPCAbased MW-GLRT technique over the PCA and MSPCA methods in terms of lower missed detection rates and ARL1 values. However, the Tennessee Eastman Process is nonlinear, open-loop unstable, and contains a mixture of fast and slow dynamics, which means the linear PCA-based and MSPCA-based methods are not able to efficiently tackle the issue of non-linearity. Thus, we have identified several directions for extending this work. First, we propose the development of a multiscale kernel PCA-based MW-GLRT algorithm, which will be able to obtain a more accurate representation of the principal components for a given set of data, and handle
62
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
Fig. 19. TEP process flow diagram.
Fig. 20. Monitoring TEP fault 12 using PCA and MSPCA-based T2 charts.
Fig. 21. Monitoring TEP fault 12 using PCA and MSPCA-based Q charts.
a wide range of nonlinearities using the kernel principal component analysis (KPCA) model. Second, since the results of the MSPCA-based MW-GLRT technique demonstrate the importance of selecting the length of the moving window accurately, we suggest
extending the MW-GLRT technique to one that assigns exponential weights to residuals in the moving window (instead of equal weightage) as it might be able to further improve fault detection performance by reducing the false alarm rate.
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
63
Fig. 22. Monitoring TEP fault 12 using PCA-based filtered SPE and Di index charts.
Fig. 23. Monitoring TEP fault 12 using PCA and MSPCA-based GLRT charts.
Fig. 24. Monitoring TEP fault 12 using MSPCA-based MW-GLRT charts.
Acknowledgements This work was made possible by NPRP grant NPRP7-1172-2-439 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.
References [1] D. Zumoffen, M. Basualdo, From large chemical plant data to fault diagnosis integrated to decentralized fault-tolerant control: pulp mill process application, Ind. Eng. Chem. Res. 47 (2008) 1201–1220, http://dx.doi.org/10. 1021/ie071064m. [2] V. Venkatasubramanian, R. Rengaswamy, K. Yin, S.N. Kavuri, A review of process fault detection and diagnosis: part I: quantitative model-based methods, Comput. Chem. Eng. 27 (2003) 293–311.
64
M.Z. Sheriff et al. / Journal of Process Control 54 (2017) 47–64
[3] V. Venkatasubramanian, R. Rengaswamy, K. Yin, S.N. Kavuri, A review of process fault detection and diagnosis: part II: qualitative models and search strategies, Comput. Chem. Eng. 27 (2003) 313–326. [4] V. Venkatasubramanian, R. Rengaswamy, K. Yin, S.N. Kavuri, A review of process fault detection and diagnosis: part III: process history based methods, Comput. Chem. Eng. 27 (2003) 327–346. [5] M.F. Harkat, G. Mourot, J. Ragot, An improved PCA scheme for sensor FDI: application to an air quality monitoring network, J. Process Control. 16 (2006) 625–634, http://dx.doi.org/10.1016/j.jprocont.2005.09.007. [6] D.X. Tien, K.-W. Lim, L. Jun, Comparative study of PCA approaches in process monitoring and fault detection, 30th Annu. Conf. IEEE Ind. Electron. Soc. 3 (2004) 2594–2599, http://dx.doi.org/10.1109/IECON.2004.1432212. [7] J.P. George, Z. Chen, P. Shaw, Fault Detection of Drinking Water Treatment Process Using PCA and Hotelling’s T2 Chart, 2009, pp. 970–975. [8] F. Janˇzekovi, T. Novak, PCA—a powerful method for analyze ecological niches, in: Princ. Compon. Anal.—Multidiscip. Appl., InTech, 2012, pp. 127–142, http://dx.doi.org/10.5772/38538. [9] E.C.M. Nascimento, J.B.L. Martins, Pharmacophoric profile: design of new potential drugs with PCA analysis, in: Rincipal Compon. Anal.—Multidiscip. Appl., InTech, 2012, pp. 59–72. [10] F. Reverter, E. Vegas, J.M. Oller, Kernel methods for dimensionality reduction applied to the «Omics» data, in: Princ. Compon. Anal.—Multidiscip. Appl., InTech, 2012, pp. 1–20. [11] D. Magyar, G. Oros, Chapter 17: application of the principal component analysis to disclose factors influencing on the composition of fungal consortia deteriorating remained fruit, in: Princ. Compon. Anal.—Multidiscip. Appl., InTech, 2012. [12] E. Belasco, B.U. Philips, G. Gong, The health care access index as a determinant of delayed cancer detection through principal component analysis, in: Princ. Compon. Anal.—Multidiscip. Appl., InTech, 2012, pp. 143–166. [13] J. Yu, Fault detection using principal components-based Gaussian mixture model for semiconductor manufacturing processes, IEEE Trans. Semicond. Manuf. 24 (2011) 432–444, http://dx.doi.org/10.1109/tsm.2011.2154850. [14] J. Xiang, Y. Zhong, H. Gao, Rolling element bearing fault detection using PPCA and spectral kurtosis, Measurement 75 (2015) 180–191, http://dx.doi.org/10. 1016/j.measurement.2015.07.045. [15] B. Jiang, J. Xiang, Y. Wang, Rolling bearing fault diagnosis approach using probabilistic principal component analysis denoising and cyclic bispectrum, J. Vib. Control. 22 (2016) 2420–2433, http://dx.doi.org/10.1177/ 1077546314547533. [16] Y. Tharrault, G. Mourot, J. Ragot, Fault detection and isolation with robust principal component analysis, 2008 16th Mediterr. Conf. Control Autom. 18 (2008) 429–442, http://dx.doi.org/10.1109/MED.2008.4602224. [17] S.J. Qin, R. Dunia, Determining the number of principal components for best reconstruction, J. Process Control. 10 (2000) 245–250, http://dx.doi.org/10. 1016/S0959-1524(99)00043-8. [18] S. Tsujita, M. Tamura, A study on the number of principal components and sensitivity of fault detection using PCA, Comput. Chem. Eng. 31 (2007) 1035–1046, http://dx.doi.org/10.1016/j.compchemeng.2006.09.004. [19] A. Benaicha, M. Guerfel, K. Bougila, Nasreddine Benothman, New PCA-based methodology for sensor fault detection and localization, in: 8th Int. Conf. Model. Simul., Hammamet, Tunisia, 2010. [20] B. Bakshi, Multiscale PCA with application to multivariate statistical process monitoring, AIChE J. 44 (1998) 1596–1610. [21] M.N. Nounou, Multiscale finite impulse response modeling, Eng. Appl. Artif. Intell. 19 (2006) 289–304 http://linkinghub.elsevier.com/retrieve/pii/ S0952197605001119. [22] M.N. Nounou, H.N. Nounou, Improving the prediction and parsimony of ARX models using multiscale estimation, Appl. Soft Comput. 7 (2007) 711–721 http://linkinghub.elsevier.com/retrieve/pii/S1568494606000196. [23] M. Mansouri, M.Z. Sheriff, R. Baklouti, M. Nounou, H. Nounou, A. Ben Hamida, M.N. Karim, Statistical fault detection of chemical process—comparative studies, J. Chem. Eng. Process Technol. 7 (2016) 1–10, http://dx.doi.org/10. 4172/2157-7048.1000282. [24] C. Botre, M. Mansouri, M. Nounou, H. Nounou, M.N. Karim, Kernel PLS-based GLRT method for fault detection of chemical processes, J. Loss Prev. Process Ind. 43 (2016) 212–224, http://dx.doi.org/10.1016/j.jlp.2016.05.023. [25] M.R. Reynolds, J.Y. Lou, An evaluation of a GLR control chart for monitoring the process mean, J. Qual. Technol. 42 (2010) 287–310. [26] E.L. Russell, L.H. Chiang, R.D. Braatz, Fault Detection and Diagnosis in Industrial Systems, Springer-Verlag, New York, NY, 2001. [27] I.T. Joliffe, Principal Component Analysis, 2nd ed., Springer-Verlag, New York, NY, 2002. [28] J.E. Jackson, Quality control methods for several related variables, Technometrics 1 (1959) 359–377 http://www.jstor.org/stable/1266717. [29] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associted with principal component analysis, Technometrics 21 (1979) 341–349 http:// www.jstor.org/stable/1267757.
[30] F. Harrou, M.N. Nounou, H.N. Nounou, M. Madakyaru, PLS-based EWMA fault detection strategy for process monitoring, J. Loss Prev. Process Ind. 36 (2015) 108–119, http://dx.doi.org/10.1016/j.jlp.2015.05.017. [31] G. Diana, C. Tommasi, Cross-validation methods in principal component analysis: a comparison, Stat. Methods Appl. 11 (2002) 71–82, http://dx.doi. org/10.1007/BF02511446. [32] M. Zhu, A. Ghodsi, Automatic dimensionality selection from the scree plot via the use of profile likelihood, Comput. Stat. Data Anal. 51 (2006) 918–930, http://dx.doi.org/10.1016/j.csda.2005.09.010. [33] S. Mallat, A theory for multiresolution signal decomposition: the wavelet representation, Pattern Anal. Mach. Intell. IEEE Trans. II (1989) 674–693. [34] H.N. Nounou, M.N. Nounou, Multiscale fuzzy Kalman filtering, Eng. Appl. Artif. Intell. 19 (2006) 439–450 http://linkinghub.elsevier.com/retrieve/pii/ S0952197605001302. [35] M.N. Nounou, H.N. Nounou, N. Meskin, A. Datta, E.R. Dougherty, Multiscale denoising of biological data: a comparative analysis, IEEE/ACM Trans. Comput. Biol. Bioinform. 9 (2012) 1539–1544. [36] R. Ganesan, T.K. Das, V. Venkataraman, Wavelet-based multiscale statistical process monitoring: a literature review, IIE Trans. 36 (2004) 787–806. [37] J.B. Buckheit, D.L. Donoho, WaveLab and reproducible research, in: A. Antoniadis, G. Oppenheim (Eds.), Wavelets Stat., 1st ed., Springer, New York, NY, 1995, pp. 55–81, http://dx.doi.org/10.1017/CBO9781107415324.004. [38] D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, 5th ed., John Wiley & Sons, Inc, Hoboken, NJ, 2011. [39] F. Gustafsson, The marginalized likelihood ratio test for detecting abrupt changes, IEEE Trans. Automat. Contr. 41 (1996) 66–78, http://dx.doi.org/10. 1109/9.481608. [40] A.S. Willsky, E.Y. Chow, S.B. Gershwin, A.L. Kurkjian, C.S. Greene, P.K. Houpt, Dynamic model-based techniques for the detection of incidents on freeways, IEEE Trans. Automat. Contr. 25 (1980) 347–360, http://dx.doi.org/10.1109/tac. 1980.1102392. [41] J.R. Dowdle, A.S. Willsky, S.W. Gully, Nonlinear generalized likelihood ratio algorithms for maneuver detection and estimation, Am. Control Conf. (1982) 985–987. [42] S.M. Kay, Fundametnals of Statistical Signal Processing, Volume II: Detection Theory, 1st ed., Pearson, 1998. [43] L. Meng, J. Xiang, Y. Wang, Y. Jiang, H. Gao, A hybrid fault diagnosis method using morphological filter-translation invariant wavelet and improved ensemble empirical mode decomposition, Mech. Syst. Signal Process. 50–51 (2015) 101–115, http://dx.doi.org/10.1016/j.ymssp.2014.06.004. [44] M.Z. Sheriff, F. Harrou, M. Nounou, Univariate process monitoring using multiscale Shewhart charts, in: 2014 Int. Conf. Control. Decis. Inf. Technol., IEEE, Metz, France, 2014, pp. 435–440. [45] M.Z. Sheriff, M.N. Nounou, Enhanced performance of shewhart charts using multiscale representation, in: 2016 Am. Control Conf., IEEE, Boston, MA, 2016, pp. 6923–6928, http://dx.doi.org/10.1109/ACC.2016.7526763. [46] M.Z. Sheriff, Improved Shewhart Chart Using Multiscale Representation, Texas A&M University, 2015. [47] A. Cinar, A. Palazoglu, F. Kayihan, Chemical Process Performance Evaluation, 1st ed., CRC Press, Boca Raton, FL, 2007. [48] R.B. Hatchett, B.W. Brorsen, K.B. Anderson, Optimal length of moving average to forecast futures basis, J. Agric. Resour. Econ. 35 (2010) 18–33. [49] Y.-H. Tsai, F.-R. Chang, W.-C. Yang, GPS fault detection and exclusion using moving average filters, IEE Proc.—Radar, Sonar Navig. (2004) 240. [50] G.E.P. Box, G.M. Jenkins, G.C. Reinsel, Time Series Analysis: Forecasting and Control, 4th ed., John Wiley & Sons, Inc, Hoboken, NJ, 2008. [51] R. Fazia, O. Taouali, I. Elaissi, N. Bouguila, Online fault detection methods and fault detection indices based on PCA approach, in: Recent Res. Electr. Eng., Lisbon, Portugal, 2014, pp. 69–78 http://www.wseas.us/e-library/ conferences/2014/Lisbon/ELEL/ELEL-08.pdf. [52] S.J. Qin, W. Li, Detection, identification, and reconstruction of faulty sensors with maximized sensitivity, AIChE J. 45 (1999) 1963–1976, http://dx.doi.org/ 10.1002/aic.690450913. [53] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Comput. Chem. Eng. 17 (1993) 245–255, http://dx.doi.org/10.1016/00981354(93)80018-I. [54] P.R. Lyman, C. Georgakis, Plant-wide control of the Tennessee Eastman problem, Comput. Chem. Eng. 19 (1995) 321–331, http://dx.doi.org/10.1016/ 0098-1354(94)00057-U. [55] S. Yin, S.X. Ding, A. Haghani, H. Hao, P. Zhang, A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process, J. Process Control. 22 (2012) 1567–1581, http://dx.doi.org/10.1016/j.jprocont.2012.06.009. [56] T. Rato, M. Reis, E. Schmitt, M. Hubert, B. De Ketelaere, A systematic comparison of PCA-based Statistical Process Monitoring methods for high-dimensional, time-dependent Processes, AIChE J. 62 (2016) 1478–1493, http://dx.doi.org/10.1002/aic.15062.