Iterated Robust kernel Fuzzy Principal Component ...

G Model

ARTICLE IN PRESS

JOCS-432; No. of Pages 16

Journal of Computational Science xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Journal of Computational Science journal homepage: www.elsevier.com/locate/jocs

Iterated Robust kernel Fuzzy Principal Component Analysis and application to fault detection Raoudha Baklouti b,c , Majdi Mansouri a,∗ , Mohamed Nounou d , Hazem Nounou c , Ahmed Ben Hamida b a

Electrical and Computer Engineering Program, Texas A&M University at Qatar, Doha, Qatar Advanced Technologies for Medicine and Signals, National Engineering School of Sfax, Tunisia c Electrical and Computer Engineering Program, Texas A&M University at Qatar, Doha, Qatar d Chemical Engineering Program, Texas A&M University at Qatar, Doha, Qatar b

a r t i c l e

i n f o

Article history: Received 28 June 2015 Received in revised form 27 October 2015 Accepted 22 November 2015 Available online xxx Keywords: Iterated robust fuzzy Kernel principal component analysis Fault detection Modeling

a b s t r a c t In this paper, we propose an Iterated Robust kernel Fuzzy Principal Component Analysis (IRkFPCA), which is the method that attempts to combine the advantages of the state of art methods and use a more accurate multi-objective function for jointly reducing the modeling errors, optimizing the robustness to outliers and improving the time complexity since it does not require the storage and inversion of the covariance matrix to obtain a memory-efficient approximation of kernel PCA. This proposed technique computes iteratively the principal components, which are used for modeling and fault detection. The detection stage is related to the evaluation of residuals, also known as detection indices, which are signals that reveal the fault presence. Those indices are obtained from the analysis of the difference between the process measurements and their estimations using the IRkFPCA technique. The performance of the proposed method is illustrated and compared to Iterated kernel Principal Component Analysis (IkPCA) and Iterated Principal Component Analysis (IPCA) methods through two simulated examples, one using synthetic data and the other using simulated continuously stirred tank reactor (CSTR) data. The results of the comparative studies reveal that the developed IRkFPCA method provides a better performance in terms of modeling and fault detection accuracies than the Iterated Robust Fuzzy Principal Component Analysis (IRFPCA) and Iterated kernel Principal Component Analysis (IkPCA) methods; while both methods provide improved accuracy over the Iterated Principal Component Analysis (IPCA) method. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Due to consistent product quality demand and higher requirements in safety, the process monitoring performance has become a key factor in improving productivity and safety. Process systems are using large amount of data from many variables that are monitored and recorded continuously every day. For these reasons, the problem of fault detection that responses effectively to faults that mislead the process and harm the system reliability represents a key process in such operation of these systems. The fault detection problem is an important process in process monitoring. Abnormal faults management mainly depends on diagnosis of the process faults and accurate fault detection. State variables in such

∗ Corresponding author at: Texas A&M Engineering Building, Education City, 23874 Doha, Qatar E-mail address: [email protected] (M. Mansouri).

applications are more difficult to control and diagnose. Process monitoring has made lot of advances in the last five decades with the advances in the computer technology. Moreover, more important monitor tasks are still operated by humans. These tasks include responding to abnormal faults in the process monitoring plant. This involves detecting the faults in time diagnosing that cause and then bringing back to a normal and safe operating state. In this work, we will focus on modeling and fault detection tasks based on more sophisticated statistical approaches. Also, monitoring the atmospheric air pollution levels is extremely important for the safety of humans and the marine life, especially in areas with large fuel productions or consumptions and large climate fluctuations [1]. For example, the heat wave in France in the summer of 2003 was linked to an exceptional ozone pollution that affected the entire European community [2]. The consequences of this heat wave demonstrated the importance of having reliable warning systems to detect unexpected pollution levels and any unforeseeable events [2]. Proper monitoring of air

http://dx.doi.org/10.1016/j.jocs.2015.11.005 1877-7503/© 2015 Elsevier B.V. All rights reserved.

Please cite this article in press as: R. Baklouti, et al., Iterated Robust kernel Fuzzy Principal Component Analysis and application to fault detection, J. Comput. Sci. (2015), http://dx.doi.org/10.1016/j.jocs.2015.11.005

G Model JOCS-432; No. of Pages 16

ARTICLE IN PRESS R. Baklouti et al. / Journal of Computational Science xxx (2015) xxx–xxx

2

pollutants provides useful information that can help people take the needed precautions to avoid undesirable consequences. During the past few decades, a lot of effort has been made to improve air quality. In Qatar, for example, an air quality network (consisting of several measurement stations) was installed across the country in order to measure the concentration levels of various air pollutants. This makes it possible to take the requisite measures to protect the environment and to ensure that the public is properly informed about any dangers, provided that accurate detection of faults or anomalies is achieved. Possible faults can be due to malfunctioning sensor/s (called sensor faults) or to abnormal changes in the process. Sensor faults are usually quantified by sudden (or quick) changes in a small number of process variables. Process faults, on the other hand, are abnormal changes caused by deviations in the process itself. These faults are usually quantified by slow drifts across several variables. The need for monitoring techniques that can accurately and quickly detect abnormal situations (sensor or process faults) has greatly attracted the attention of researchers and engineers. Over the past few decades, several monitoring techniques have been developed [3–6]. Generally, fault detection techniques can be classified into three main categories: data-based or model-free techniques, model-based techniques, and knowledge-based techniques [3,5]. Fault detection using knowledge-based techniques is usually a heuristic process. Techniques in this category are mostly based on causal analysis, expert systems [7], possible cause and effect graphs (PCEG) [8], failure mode and effect analysis (FMEA) [9], Hazop-digraph (HDG) [10], or Bayesian networks [11]. The main limitation of these methods is that they are more suitable to smallscale systems having a small number of variables, and thus may not be appropriate to monitor complex processes. In this work, we develop a new improved iterated nonlinear PCA which addresses the above issues. The developed Iterated Robust kernel Fuzzy PCA (IRkFPCA) is proposed to relate the kernel PCA learning rules to an accurate multi-objective function which considers of many parameters (modeling error, robustness, fuzziness and memory efficiency) that may influence the relevance of the modeling and monitoring in process operations. This proposed technique computes iteratively the principal components (PCs), which are used to compute the model and the two charts T2 and Q for fault detection problem. The rest of the paper is organized as follows. We first discuss related work and motivate the need for our proposed technique in Section 2. Section 3 is devoted to the multi-objective functionbased Iterated Robust kernel Fuzzy PCA description, followed by descriptions of the two main detection indices, T2 and Q, which are used for fault detection. Then, in Section 4, the performance of the proposed IRkFPCA is compared for its application to estimate the model and to detect the faults. Conclusions are presented in Section 5.

2. Related work Several works have been proposed in the areas of process monitoring and fault detection. Generally, fault detection techniques can be classified into three main categories: data-based or model-free techniques, model-based techniques, and knowledge-based techniques [3,5]. Fault detection using knowledge-based techniques is usually a heuristic process. Techniques in this category are mostly based on causal analysis, expert systems [7], possible cause and effect graphs (PCEG) [8], failure mode and effect analysis (FMEA) [9], Hazop-digraph (HDG) [10], or Bayesian networks [11]. The main limitation of these methods is that they are more suitable to smallscale systems having a small number of variables, and thus may not be appropriate to monitor complex processes.

Model-based monitoring methods, on the other hand, rely on comparing the process measurements with knowledge obtained from a mathematical process model, which is usually derived using some fundamental understanding of the process under fault-free conditions. The residuals, which are the difference between the measurements and the model prediction, can be used as an indicator about the existence or absence of faults [12,13]. When the monitored process is under normal operating conditions (no faults exist), the residuals are zero or close to zero in cases of modeling uncertainties and measurement noise. However, when a fault occurs, the residuals deviate significantly from zero indicating the presence of a new condition that is significantly distinguishable from the normal faultless working mode [12,13]. The model-based monitoring approaches include the observer-based approaches [14–16], parity space approaches [17–21], and interval approaches [22–24]. Of course, the effectiveness of these model-based monitoring methods depends on the accuracy of the models used. Unfortunately, sometimes it is very difficult to derive accurate models of the monitored systems, especially for complex processes, such as in the cases of many chemical and environmental processes. For example, modeling the ozone level is very challenging because of the complexity of the ozone formation mechanisms in the troposphere and the uncertainty about the meteorological conditions [25]. Also, modeling chemical and petrochemical processes is a challenging task because of the complexity and sometimes the lack of understanding about these processes. In these cases, data-based monitoring techniques are more commonly used. Data-based monitoring methods rely on the availability of historical data obtained from the monitored fault-free process [5]. These data are first used to build an empirical model, which is then used to detect faults in future data. Data-based monitoring methods include the latent variable regression methods, such as partial least square (PLS) regression, principal component analysis (PCA), independent component analysis (ICA), canonical variate analysis (CVA), [26,5], neural networks [27], Fuzzy systems [28], pattern recognition methods [29], and support vector machine (SVM) based methods [30,31]. SVM based fault detection methods can be applied to nonlinear systems and offer advantages over conventional nonlinear optimization based techniques. Databased monitoring methods, especially those that utilize PCA or its extensions, have been widely used in many applications in a very wide range of industries, e.g., air quality monitoring [32], chemical industry [33,34], water treatment [35,36], pharmacology [37], biology and biotechnology [38], agriculture [39], health [40], semiconductors [41], and many others. PCA provides a linear combinations of variables that demonstrate major trends in data set. In mathematical terms, PCA is based on the orthogonal decomposition of the covariance matrix of the process variables along the directions that give the maximum variation of the data. PCA has been studied for two problems: the Multivariate Statistical Process Control (MSPC) [42], and the fault detection and isolation (FDI) problem [43–47]. The authors in [48] have listed the fault detection and diagnosis methods in three different categories: (i) quantitative model-based schemes, (ii) qualitative models and search strategies and (iii) process data based methods. PCA is the third category because it uses databases to obtain the statistical model (PCA model). The main indices used with PCA methods are Hotelling statistic, T2 , and the sum of squared residuals, SPE, or Q statistic. The T2 statistic is a measure of the variation captured in the PCA model and the Q statistic is a measure of the amount of variation not captured by the PCA model. Nevertheless, the conventional PCA has some disadvantages: (i) the conventional PCA method is not suitable in nonlinear cases and assumes that the relationships between variables are linear and hence may not always be the most appropriate method of analysis. (ii) The conventional PCA is least square estimation technique and hence fail


G Model

ARTICLE IN PRESS


R. Baklouti et al. / Journal of Computational Science xxx (2015) xxx–xxx

to account for outliers which are common in realistic training sets. (iii) The conventional PCA is not suitable in the cases of on-line process monitoring and it becomes computationally infeasible to directly solve the eigenvalue problem for a large number of samples. A similar problem occurs when the covariance matrix becomes large. Several solutions have been proposed to address the above issues [49–56]. The authors in [51–53] have proposed a nonlinear extensions of PCA to handle a wide range of nonlinearities and provide a good performance over the linear versions. While, the authors in [49,50,54] have proposed a new robust fuzzy analysis techniques to extract more information from the data and make PCA robust to outliers. In addition, outliers are known to influence on the resulting principal component and hence they also have an impact on the modeling as well as fault detection performances. Moreover, to make the conventional PCA suitable in on-line process monitoring, the iterated PCA (IPCA) [56] and iterative kernel PCA (IkPCA) [55] have been proposed, however, they have been mainly used for image modeling. It is demonstrated that IkPCA provides a good results compared to kPCA and iterated PCA (IPCA) [56], however, the developed IPCA and IkPCA techniques have used only the generalized Hebbian algorithm (GHA) or kernel Hebbian algorithm (KHA) to reduce the memory complexity. In addition, kPCA is powerful nonlinear modeling technique, but unfortunately, it cannot be updated online as new data are collected. Therefore, the objective of this work is to develop IRkFPCA modeling scheme that provides the same predictive abilities of kPCA while maintaining acceptable computational complexity to a level where they can be implemented online. The proposed technique iteratively compute the principal components of the kPCA model online using a multi-objective function formulation. 3. Iterated Robust kernel Fuzzy Principal Component Analysis (IRkFPCA) Description Next, we present the classical principal component analysis. 3.1. Principal component analysis (PCA)

(1)

where U is n-by-n matrix, and W = [w1 w2 . . .wm ] ∈ Rm×m is an orthogonal vectors matrix wi ∈ Rm which includes the eigenvectors associated with the covariance matrix of X, i.e., ˙, which is given by ˙=

1 X T X = WW T n−1

with

WW T = W T W = Im ,

When the number of PCs l is determined, then, the data matrix X is shown as the following:

X = SW T = [ S S ][W

] W

T

(3)

S ∈ Rn×l and S ∈ Rn×(m−l) , are matrices of l retained PCs and where ∈ Rm×l the (m − l) ignored PCs, respectively, and the matrices W m×(m−l) and W ∈ R are matrices of l retained eigenvectors and the (m − l) ignored eigenvectors, respectively. Using Eq. (3), the following can be written:

X

R

T + SW T = XW W T + X(Im − W W T) . X = SW

(4)

represents the modeled variation of X based on first The matrix X l components, and the matrix R represents the variations corresponding to process noise. The developed Iterated Robust Fuzzy Principal Component Analysis is the method that uses a more accurate multi-objective function for jointly reducing the modeling errors, optimizing the robustness to outliers and improving the memory efficiency. Next, we present the multi-objective function. 3.2. Multi-objective function The main idea of multi-objective function is to define the main parameters that may influence the relevance of the modeling and monitoring in process operations, which are: (i) the modeling error J1 (W) (detailed in Section 3.2.1), (ii) the fuzziness and robustness J2 (W) (detailed in Section 3.2.2) and (iii) the memory efficiency J3 (W) (detailed in Section 3.2.3). The problem is how to formulate the criteria to select the optimal eigenvector W that minimizes the multi-objective function and provides satisfied model. Thus, the objective is to choose the optimal eigenvector W that minimizes modeling error, maximizes the robustness and fuzziness to noises and minimizes the memory-efficient so that, ˆ = arg min[J(W )], W

(5)

where the multi-objective function J(W) is given by,

Let Xi ∈ Rm denotes a ith sample vector of m number of sensors. Also, assume there are n samples dedicated to each sensor, a data matrix X ∈ Rn×m is with each row, displaying a sample. Meanwhile, X matrix is scaled to zero mean for covariance-based PCA and at the same time, to unit variance for correlation-based PCA [46]. Applying eigen decomposition to the data matrix X as follows: X = U˙W T ,

3

(2)

where = diag(1 , 2 , . . ., m ) is a diagonal matrix containing the eigenvalues related to the m PCs, (1 > 2 > . . . > m ), and Im is the identity matrix [57]. The effectiveness of the PCA model depends on the number of PCs are to be used for PCA. Selecting an appropriate number of PCs introduces a good performance of PCA in terms of processes monitoring. Several methods for determining the number of PCs have been proposed such as; the Scree plot [58], the cumulative percent variance (CPV), the cross validation [59], and the profile likelihood [60]. In this study herein, the cumulative percent variance method is utilized to come up with the optimum number of retained PCs.

J(W ) = [˛1 ˛2 ˛3 ][J1 (W ) J2 (W ) J3 (W )]T ,

(6)

and ˛1 , ˛2 , and ˛3 , are the learning rates of J1 , J2 and J3 , respectively. 3.2.1. Modeling error The modeling error is the distance between the original data and its estimate Xˆ using the Iterated Robust Fuzzy Principal Component Analysis (IRFPCA), so that, ˆ F = X − X W ˆW ˆ T F . E(X) = X − X

(7)

where ||.||F is the Frobenius norm. Next, we present the energy function that minimizes the modeling error distance by incorporating the constraints of component principals [61],

min

ˆW ˆ T F X − X W

s.t.

ˆW ˆ T = 1, W

ˆ iW ˆj =0 W

.

(8)

By using Lagrange multipliers with the consideration of the constraints into the objective, the energy function J1 (W) can be expressed as: J1 (W ) = E[X T W T WX] +

1 tr[L(W T W − I)], 2

(9)

where L is the Lagrange multiplier matrix and tr is lower triangular. Taking gradients with respect to W and setting to zero yields,

∂W J1 = E[WX]X T + LW = 0.

(10)


G Model

ARTICLE IN PRESS



4

As a consequence of the KKT conditions [62], at optimality, L(WW T − I) = 0.

(11)

Right multiplying Eq. (10) by

WT ,

(12)

Plugging Eq. (12) into Eq. (10) and stochastically approximating the expectation E[Y ] with its instantaneous estimate Y(t) = W(t)X(t), yields, W (t + 1) = W (t) + ˛1 J1 (W (t)),

(14)

where ˛1 is the learning rate for the objective function J1 . Thus, the first used objective function aims to reduce the modeling error is J1 . 3.2.2. Fuzziness and robustness The PCA might be affected by outliers and a several contributions have been proposed for robustification of PCA [49,50]. In addition, outliers are known to influence on the resulting principal component and hence they also have an impact on the modeling as well as fault detection performances. On the other hand, there are many fuzzy approaches in regression analysis which address this issue [63,64], such that the fuzzy clustering which is an important technique to distinguish between the healthy and faulty structures and identify the structure in the data. To deal with the above problems, we propose to use the objective function J2 proposed in [65], where PCA learning rules are related to energy functions with the consideration of outliers:

⎧ n n ⎪ ⎨ min J (W ) = (U )r E(X ) + (1 − U )r , 2 i i i i=1 i=1 ⎪ ⎩

(15)

r ∈ [1, ∞[

where r ∈ [1,∞[is the membership of Xi belonging to the data, (1 − Ui ) is the membership of Xi belonging to the noise, r is the weighting exponent, E(Xi ) measures the error between Xi and the class center. The objective is to minimize Eq. (15) with respect to W and Ui simultaneously. Since W is a continuous variable and Ui is a binary variable, to overcome the problem of mix of both continuous and discrete optimization, the authors in [66] transformed the objective from the minimization of Eq. (15) to the maximization of the following Gibbs distribution: exp(−E(U, W )) , Z

(16)

where Z is the function that assures ˙ U W P(U, W) = 1 and and are two parameters of FRPCA algorithm. The authors in [66] computed the marginal distribution Pmarginal (W) for approximating the maximization of P(U, W), using the saddle point method proposed in [67]. First, the gradient of J2 (W) is computed with respect to Ui , as ∂Ui J2 = 0, we get, Ui =

1 1 + (E(Xi )/)

1/(r−1)

.

(17)

Substituting this membership back and after simplification, we get, J2 (W ) =

n

i=1

r

1 1 + (E(Xi )/)

1/(r−1)

i=1

E(Xi ).

(18)

r

1 1 + (E(Xi )/)

1/(r−1)

∂W E(Xi ), (19)

r is called a fuzziness variable. After the training, the membership Ui is decided as the following,

Ui =

J1 (W (t)) = Y (t)X(t)T − Y (t)Y (t)T W (t),

P(U, W ) =

∂W J2 = ∂E(Xi ) J1 ∂W E(Xi ) =

(13)

and

Ui ∈ [0, 1],

n

using Eq. (11),

L = E[WX]X T W T = −E[Y ]Y T .

s.t.

Using the multidimensional chain rule, the gradient of J2 with respect to W is given by,

1 if 0

E(Xi ) <

(20)

,

otherwise

plays the role of thresholding in this situation. If r− > ∞, then the maximum fuzziness is achieved: Ui = (1/2) for all Xi . The distance E(Xi ) might be expressed as [65,68], E(Xi ) = ||Xi ||2 −

||W T Xi ||2 . ||W ||2

(21)

The objective function is given using Eqs. (18) and (21),

⎛

⎞r

1

J2 (W ) = ⎝ 1 + (||X||2 −

||W T X||2 /) ||W ||2

1/(r−1)

⎠

||W T X||2 ||X|| − ||W ||2 2

. (22)

The gradient descent rules at every instant of time t for estimating the weight W(t) is given by: W (t + 1) = W (t) + ˛2 J2 (W (t)),

(23)

where ˛2 is the learning rate for the objective function J2 . Hence, the second used objective function robust to noises is J2 . 3.2.3. Memory efficiency Computationally, it becomes infeasible to directly solve the eigenvalue problem for a large number of samples. This occurs when the size of the covariance matrix becomes large. To deal with these issues, IPCA based on generalized Hebbian algorithm has been proposed in [69] which provides a memory-efficient implementation and requires very little memory. The proposed IPCA solves the eigenvalue problem by an iterative scheme which do not need to compute and store the covariance matrix. To estimate the weight at every instant of time W(t), the authors in [69] have proposed the following iterative rule: W (t + 1) = W (t) + ˛3 J3 (W (t)),

(24)

where ˛3 is the learning rate, and J3 is the objective function given as follows [69], J3 (W (t)) = (t)[Y (t)X(t)T − tr(Y (t)Y (t)T )W (t)],

(25)

where (t) is a learning rate parameter and tr is the lower triangular operator. In order to accelerate the convergence, an extension of IPCA method has been proposed in [70] which incorporates a scalar gain parameter (t), that guarantees the convergence on stationary input. This scalar gain parameter is expressed as [70], (t) =

0 , t+

(26)

where t is the iteration number, 0 is positive tuning parameter, and is the parameter which determines the length of an initial search phase with near-constant gain. Thus, the third used objective function aims to reduce the memory efficiency is J3 .


G Model

ARTICLE IN PRESS



In this work, the developed J based IRFPCA technique computes iteratively the PCs W, which are used for the modeling and fault detection (see Section 4). J based IRFPCA should result a good performance in terms of fault detection as well as modeling results, since it considers of many parameters that may influence the relevance of the modeling and monitoring in process operations. J based IRFPCA uses a multi-objective function J that takes into consideration many important objective functions: (i) J1 : the objective function aims to reduce the modeling error by minimizing the variance Y = WX, (ii) J2 : the objective function robust to outliers and uses the fuzziness covariance to solve the eigenvalue problem, and (iii) J3 : the objective function that minimizes the memory efficiency. Next, we present the multi-objective function based Iterated Robust Fuzzy Principal Component Analysis method. 3.3. Iterated Robust Fuzzy Principal Component Analysis (IRFPCA) method In the case where the size of the covariance matrix is large, it is better to solve the eigenvalue problem by iterative schemes which do not need to compute and store the covariance matrix. To compute iteratively the PCs, the developed Iterated Robust Fuzzy Principal Component Analysis proposes to use the following iterative rules, T

W (t + 1) = W (t) + [˛1 ˛2 ˛3 ][J1 (W (t)) J2 (W (t)) J3 (W (t))] .

Using Eqs. ((22), 14 and (25), we get the following iterative scheme,

⎞r

⎢⎜

+ ˛2 ⎣⎝

1 + ||X||2 −

1 ||W T X||2 ||W ||2

⎟ 1/(r−1) ⎠ /

+ ˛3 (t) Y (t)X(t)T − LT (Y (t)Y (t)T )W (t) .

||X||2 −

||W T X||2 ||W ||2

⎤ ⎥ ⎦

(28)

In this work, the developed J based IRkFPCA technique computes iteratively the PCs W, which are used for the modeling and fault detection (see Section 4). J based IRkFPCA should result a good performance in terms of fault detection as well as modeling results, since it considers of many parameters that may influence the relevance of the modeling and monitoring in process operations. J based IRkFPCA uses a multi-objective function J that takes into consideration many important objective functions: (i) J1 : the objective function aims to reduce the modeling error by minimizing the variance Y = WX, (ii) J2 : the objective function robust to outliers and uses the fuzziness covariance to solve the eigenvalue problem, and (iii) J3 : the objective function that minimizes the memory efficiency. 3.4. IRkFPCA fault detection indices The IRkFPCA model is used for fault detection through its detection indices (Hotelling’s T2 and Q statistics), which are presented next. 3.4.1. Hotelling’s T2 statistic The T2 statistic is a way of measuring the variation captured in the PCs at various time samples, and it is known as [71]:

−1 W T X, T 2 = XT W

T˛2 =

l(n − 1) F , n − l l,n−l,˛

(30)

where ˛ is the level of significance (˛ usually between 1% and 5%), n is the number of samples in data set, l is the number of retained PCs, and Fl,n−l is the Fisher F distribution with l and n − l degrees of freedom. These thresholds are computed using faultless data. When the number of observations, n, is high, the T2 statistic threshold is approximated with a 2 distribution with l degrees of freedom, i.e., T˛2 = 2l,˛ . 3.4.2. Q statistic or squared prediction error (SPE) It is possible to detect new events by computing the squared prediction error SPE or Q of the residuals for a new observation. Q statistic [73], is computed as the sum of squares of the residuals. Also, the Q statistic is a measure of the amount of variation not captured by the IRFPCA model, it is defined as [74]:

W T )2 , ˜ 2 = X(I − W Q = X

(31)

= X(I − W W T ). X˜ = X − X The monitored system, meanwhile, is accepted to be in normal operation if: Q ≤ Q˛

W (t + 1) = W (t) + ˛1 Y (t)X(t)T − Y (t)Y (t)T W (t)

⎡⎛

= diag(1 , 2 , . . ., l ), is a diagonal matrix containing the where eigenvalues related to the l retained PCs. For new real-time data, when the value of T2 statistic exceeds the threshold, T˛2 calculated as in [71,72], a fault is detected. The threshold number used for the T2 statistic is computed as [71]

where

(27)

5

(29)

(32)

The threshold Q˛ used for the Q statistic can be computed as [57]

! Q˛ = ϕ1

h0 c˛

"

ϕ1

2ϕ2

+1+

ϕ2 h0 (h0 − 1)

#

ϕ12

(33)

$m

where ϕii=1,2,3 = i , h0 = 1 − ((2ϕ1 ϕ3 )/(3ϕ22 )) and c˛ is the j=l+1 j value of the normal distribution with ˛ level of significance. At the instant of an unusual event; when there is a change in the covariance structure of the model, this change is going to be detected by a high value of Q. For new data, the Q statistic is computed and compared to the threshold Q˛ [57]. This means a fault is detected when the confidence limit is violated. The threshold value is computed on the assumption that the measurements are independent of time and they are multivariate normally distributed. The Q fault detection index is highly sensitive to errors in modeling and the performance of it is dependent on the number of retained PCs, l, [75]. The algorithm which studies the developed IRkFPCA fault detection technique is presented in Algorithm 1. In the next section, the proposed IRkFPCA algorithm performance will be assessed and compared to Iterated Principal Component Analysis, Iterated kernel Principal Component Analysis and Iterated Robust Fuzzy Principal Component Analysis methods through a simulated model. The performances of IPCA, IkPCA, IRFPCA and IRkFPCA methods will be evaluated in terms modeling error (see Section 4.1) and fault detection (see Section 4.2). 4. IRkFPCA and applications Next, we apply the developed IRkFPCA for modeling.


G Model

ARTICLE IN PRESS



6

Algorithm 1: Iterated Robust kernel Fuzzy Principal Component Analysis Fault Detection Algorithm. Input: Training fault-free data; Xtr, testing faulty data; Xtest, confidence interval ˛ Output: Predicted signal; Y, T2 , Q statistics, T˛2 and Q˛ statistic thresholds Initializations: • IRkFPCA parameters: learning coefficient ˛0 ∈ (0, 1], soft threshold and weight W; • IRkFPCA running step: Standardize the new data Xtr; Update the principal component coefficients or loading matrix W using Eq. (28); Compute the optimal number of PCs to be used using the CPV method; Generate a residual vector, R, using IRkFPCA; Compute the T2 and Q statistics using Eqs. (29) and (31) for the new data; Compute the T˛2 and Q˛ statistic thresholds using Eqs. (30) and (33); • Testing step: Standardize the new data Xtest; Generate a residual vector, R, using IRFPCA; if T 2 ≥ T˛2 or Q ≥ Q˛ , then declare a fault.

Table 1 Computational times.

4.1. IRkFPCA and Application to modeling The simulated model (34) is used to generate the responses of the 4 state variables vector as functions of time xt = [xt1 , xt2 , xt3 , xt4 ], as the following:

⎧ xt1 = [−2 × ones(200, 1); ones(300, 1); 3 × ones(500, 1)], ⎪ ⎪ ⎪ ⎪ ⎨ xt = sin (0.1 : 0.1 : 100)T + cos (0.1 : 0.1 : 100)T , 2 ⎪ xt3 = .5 × xt2 + .5 × xt2 , ⎪ 1 ⎪ ⎪ ⎩ 2

(34)

xt4 = .3 × xt1 + .7 × xt . 2

These simulated states are assumed to be noise free. They are then contaminated with zero mean Gaussian errors, i.e., a measurement noise vk−1 ∼N(0, v2 ), so that,

⎧ X1 = xt1 + v × randn(size(xt1 )), ⎪ ⎪ ⎪ ⎪ ⎨ X2 = xt + v × randn(size(xt )), 2

2

(35)

⎪ X3 = xt3 + v × randn(size(xt3 )), ⎪ ⎪ ⎪ ⎩ X4 = xt4 + v × randn(size(xt4 )).

The generated data were arranged as a matrix X = [X1 , X2 , X3 , X4 ] having 1000 samples and 4 model observations. The responses of the 4 state variables X1 , X2 , X3 and X4 , are shown in Fig. 1. From Fig. 2(a) and (b), we can see that IRFPCA is less sensitive to outliers than IPCA or IkPCA because some kernels have a noise

Technique

IkPCA

IRkFPCA

Execution time value

3.15 s

4.93 s

suppressing property and using fuzzy membership diminish the effect of outliers in order to make it robust. Indeed, IRFPCA uses a more accurate multi-objective function J(W, ˛) compared to IPCA/ IkPCA which uses only the GHA/KHA to reduce memory complexity and then solve the eigenvalue problem by iterative schemes. As shown in Fig. 2(a) and (b), the proposed technique reveals the lowest estimation error compared with IPCA, IkPCA and IRFPCA methods where the learning rates vector ˛ is fixed to [1/3, 1/3, 1/3]. Fig. 2 shows the evolution of the modeling error using IPCA, IkPCA, IRFPCA and IRkFPCA methods with special case values of ˛. For example, Fig. 2(e) illustrates the modeling results using IPCA, IkPCA, IRFPCA and IRkFPCA methods, where ˛1 = 0, ˛2 = 0 and ˛3 = 1. We can show also from Fig. 2, that with different values of ˛, the proposed IRkFPCA method provides the smallest modeling error. This superiority lies in its ability to combine the advantages of the IPCA, IkPCA methods and use a more accurate multi-objective function for jointly reducing the modeling errors, optimizing the robustness to outliers and improving the memory efficiency. Table 1 reports the time spent with IkPCA and IRkFPCA in the fault detection task where ˛1 = 1/3, ˛2 = 1/3 and ˛3 = 1/3.

X1

5 0 −5 0

Original data

100

200

300

400

2

500 600 Sample Number

700

800

900

1000 Original data

X2

1 0 −1 −2 0

100

200

300

400


700

800

900

100

200

300

400


700

800

900

100

200

300

400


700

800

900

10

1000

X3

5 0 −5 0

Original data

1000

3

X4

2 1 Original data

0 −1 0

1000

Fig. 1. The time evolution of the generated data.


G Model JOCS-432; No. of Pages 16


7

Fig. 2. Modeling error with different values of learning rate scalars ˛1 , ˛2 and ˛3 .

These results confirm that the IRkFPCA method outperforms the classical IkPCA algorithm in terms of modeling errors and fault detection accuracies. The results show that the proposed algorithm outperforms the classical algorithm by jointly reducing the modeling errors, improving the robustness to outliers and optimizing the

memory efficiency. With respect to the execution time, the IkPCA outperforms the proposed IRkFPCA algorithm, since it optimizes jointly the modeling errors, the robustness to outliers and the memory efficiency during the fault detection phase and consider the complete enumeration of the solution space.


G Model

ARTICLE IN PRESS



8

0.222

0.21 0.2 0.19 0.18 0.17 0.16 0.1

0.2

0.3

0.4

α1

0.5

0.6

0.7

0.22 0.219 0.218 0.217 0.216 0.215 0.214 0.213 0.212 0.1

0.8

0.2355

IRFPCA

0.221

Mean square modeling error

IRFPCA



0.22

0.2

0.3

0.4

α2

0.5

0.6

0.7

IRFPCA

0.235 0.2345 0.234 0.2335 0.233 0.2325 0.232 0.2315 0.231 0.2305 0.1

0.8

0.2

0.3

0.4

α3

0.5

0.6

0.7

0.8

(a) IRFPCA: Mean square modeling error ver- (b) IRFPCA: Mean square modeling error ver- (c) IRFPCA: Mean square modeling error versus α1

sus α2 0.0482

0.0475 0.047 0.0465 0.046 0.0455 0.045 0.0445 0.044 0.0435 0.1

0.2

0.3

0.4

α1

0.5

0.6

0.7

0.048 0.0478 0.0476 0.0474 0.0472 0.047 0.0468 0.1

0.8

0.056

IRkFPCA


IRkFPCA



0.048

sus α3

0.2

0.3

0.4

α2

0.5

0.6

0.7

IRkFPCA

0.0555 0.055 0.0545 0.054 0.0535 0.053 0.0525 0.052 0.0515 0.1

0.8

0.2

0.3

0.4

α3

0.5

0.6

0.7

0.8

(d) IRkFPCA: Mean square modeling error (e) IRkFPCA: Mean square modeling error (f) IRkFPCA: Mean square modeling error verversus α1

versus α2

sus α3

Fig. 3. Mean square modeling error analysis versus the learning rate scalars ˛1 , ˛2 and ˛3 .

Some practical challenges, however, can affect the accuracy of modeling of process. Such challenges include the value of the learning rate ˛, the fuzziness parameter, and the presence of noise in the data (the value of signal-to-noise ratio (SNR)). The effect of the above challenges on the performances of the analysis techniques IPCA, IkPCA, IRFPCA and IRkFPCA are investigated in Section 4.1.1. 4.1.1. Mean square modeling error analysis Eventually, to perform comparison between the techniques in terms of modeling error, the mean square modeling error (MSME) criteria will be used and calculated on the data (with respect to the noise free data)

%

MSME = E

2 (X − X ) ,

(36)

) is the true model (resp. the estimated model). where X (resp. X Next, we present the impact of the learning rate ˛, the fuzziness parameter, and the noise variance on the MSME. 4.1.2. Effect of learning rate on the mean square modeling error To study the effect of the learning rate on the modeling performances of IRFPCA and IRkFPCA, the modeling performance is 0.8

analyzed for different values of ˛1 , ˛2 and ˛3 . Fig. 3 shows the mean square modeling error versus ˛1 , ˛2 and ˛3 . For example, Fig. 3(d) shows the MSME versus ˛1 , where ˛2 and ˛3 are fixed to 0.1 and (1 − ˛1 − ˛2 ), respectively. From Fig. 3, we can show that, with different simulation conditions, IRkFPCA method is better than IRFPCA and both provide a good modeling (i.e. small mean square modeling error). 4.1.3. Effect of noise content on the mean square modeling error To investigate the effect of noise on the modeling performance of IRFPCA and IRkFPCA, different noise levels are considered. Fig. 4 shows the modeling error performances with different values of noise variances (varies from 0 to 1). We can show from Fig. 4, that when the noise variance v2 increases the mean square modeling error increases and both IRFPCA and IRkFPCA methods provide a small mean square modeling error with different noise levels. 4.1.4. Effect of fuzziness parameter on the mean square modeling error To study the effect of fuzziness parameter on modeling error of IRFPCA and IRkFPCA methods, the mean square modeling error is computed with different values of fuzziness parameter (varies from 0.22

IRFPCA

IRkFPCA 0.2 Mean square modeling error


0.7 0.6 0.5 0.4 0.3 0.2

0.16 0.14 0.12 0.1

0.1 0 0

0.18

0.1

0.2

0.3

0.4

0.5 σ 2v

0.6

0.7

0.8

0.9

1

(a) IRFPCA: Mean square modeling error versus noise variance

0.08 0

0.1

0.2

0.3

0.4

0.5 2 σv

0.6

0.7

0.8

0.9

1

(b) IRkFPCA: Mean square modeling error versus noise variance

Fig. 4. Mean square modeling error analysis versus noise variance.


G Model

ARTICLE IN PRESS


R. Baklouti et al. / Journal of Computational Science xxx (2015) xxx–xxx 0.157

9

0.087

IRFPCA

IRkFPCA

0.0868 Mean square modeling error


0.156

0.155

0.154

0.153

0.0866 0.0864 0.0862 0.086 0.0858 0.0856 0.0854

0.152

0.0852 0.151 1

1.1

1.2

1.3

1.4 1.5 1.6 Fuzzy parameter

1.7

1.8

1.9

0.085

2

(a) IRFPCA: Mean sqaure modeling error versus fuzzy parameter

1

1.1

1.2

1.3

1.4 1.5 1.6 Fuzzy parameter

1.7

1.8

1.9

2

(b) IRkFPCA: Mean sqaure modeling error versus fuzzy parameter

1 to 2). Fig. 5 shows that the developed IRkFPCA technique has a good results versus the fuzziness parameter and provides a small mean square modeling error. Next, in Section 4.2, we apply the developed IRkFPCA through its two charts T2 and Q to detect the faults using the simulated model (34). IRkFPCA computes iteratively the PCs, which are used to compute the model and the two fault detection indices; T2 and Q. 4.2. IRkFPCA and application to fault detection As described in Algorithm 1, the fault-free model training data were used to construct a IRkFPCA reference model to be used in fault detection. The fault-free model data were arranged as a matrix Xtr having 500 rows (samples) and 4 columns (model measurements). These data are first scaled (to have zero mean and unit variance), and then are used to construct the IRkFPCA model. The responses of the training fault-free data, are shown in Fig. 6. In IRkFPCA, most of the crucial variations in the data set are typically captured in the main PCs corresponding to the maximum eigenvalues as shown in Fig. 7. In this study herein, the cumulative percent variance (CPV) [76] method is utilized to find out the optimum number of retained PCs. The CPV method is used to determine the optimum number of retained PCs with a threshold value of 90%,

Variance

Fig. 5. Mean square modeling error versus fuzziness parameter using IRFPCA and IRkFPCA methods.

1

100%

0.9

90%

0.8

80%

0.7

70%

0.6

60%

0.5

50%

0.4

40%

0.3

30%

0.2

20%

0.1

10%

0

1

2 Principal Component

3

0%

Fig. 7. Variance captured by each principal component.

which results in retaining three PCs (which capture 57.99%, 27.30%, and 12.73% of the total variations) as shown in Fig. 7. The IRkFPCA model formed utilizing the fault-free data is deployed in this section to detect possible faults with unseen testing data. The data set from tests includes 500 data samples, that are free of the training data. Single fault, double faults and multiple faults in the state variable X1 are taken into consideration. Now, the performances of the different fault detection indices will

Xtr1

2 0 −2 0

Training fault−free data 50

100

150

200

250 Sample Number

300

350

Xtr2

2

Xtr3

450

500

Training fault−free data

0 −2 0 5

50

100

150

200

250 Sample Number

300

350

400

450

500


0 −5 0

50

100

150

200

5 Xtr4

400

250 Sample Number

300

350

400

450

500


0 −5 0

50

100

150

200

250 Sample Number

300

350

400

450

500

Fig. 6. The time evolution of the training fault-free data Xtr.


G Model

ARTICLE IN PRESS



10

10 Xtst1

Testing faulty data 5 0 0

50

100

150

200

250 Sample Number

300

350

400

450

500

Xtst2

2 Testing faulty data 0 −2 0

50

100

150

200

250 Sample Number

300

350

400

5

450

500

Xtst3

Testing faulty data

0 0

50

100

150

200

250 Sample Number

300

350

400

450

500

5 Xtst4

Testing faulty data 0 −5 0

50

100

150

200

250 Sample Number

300

350

400

450

500

Fig. 8. The time evolution of the testing faulty data in the presence of a single fault in X1 .

be assessed. To use the abilities of IRkFPCA technique in terms of fault detection through its indices T2 and Q, three different cases of faults are going to be taken into consideration, (i) an additive fault (single fault) was introduced in X1 , it consists of a bias of amplitude equal to 20% of the total variation in X1 (see Fig. 8), between sample numbers 200 and 300, (ii) a double faults were introduced in X1 (see Fig. 10) and (iii) multiple faults were introduced in X1 (see Fig. 12). The IRkFPCA technique is initially run using the training faultfree data. Based on the first three PCs, the IRFPCA model based T2 and Q statistics are used for fault detection. The results of Q statistic is shown in Fig. 9(a), where the dotted line represents the detection threshold Q˛ , which is found to be 5.637. Fig. 9(b) presents the results of the T2 statistic, where the dotted line represents the detection threshold T˛2 , which is found to be 6.052. We can show from Fig. 9(a) that Q statistic at the time interval [200 . . . 300] is always above the threshold Q˛ , which means that the data fit the IRkFPCA model well (since it could capture most of the variations in the data), and verifies that the data belongs to the normal operating region. We can also show that the results of Q and T2 statistics show the ability of the IRkFPCA technique to detect this additive fault (see Fig. 9(a) and (b)). Double faults in state variable X1 are introduced in the intervals [100 . . . 150] and [250 . . . 350] respectively. These faults are represented by a constant bias of amplitude equal 20% of the total variation in X1 . We can show from Fig. 11, the results using IRkFPCA based Q, T2 statistics for faults detection. Fig. 11 shows the ability of IRkFPCA based Q, T2 indices to detect these additive faults. We can also show from Fig. 13, that IRkFPCA based Q, T2 statistics can detect multiple faults without any false alarms, where the multiple faults in state variable X1 are introduced in the intervals [100 . . . 150], [250 . . . 350] and [400 . . . 450] respectively. Next, the developed IRkFPCA algorithm presented is illustrated through its application on a controlled continuous stirred tank reactor (CSTR) in which a non-isothermal, irreversible first order reaction A → B takes place [77]. 4.3. IRkFPCA and application to fault detection in simulated CSTR model Next, the CSTR model that is used for fault detection is described.

4.3.1. CSTR process description The dynamic model for the non-isothermal CSTR can be given by ([77,78]),

∂CA F (CA0 − CA ) − K0 e−E/RT CA = V ∂t (− H)k0 −E/RT Q ∂T F e CA − = (T0 − T ) + V

Cp V Cp ∂t Q

aF b+1 c

= Fc +

aF bc 2 c Cpc

(37)

(T − Tcin )

where k0 is the reaction rate constant, E is the activation energy, CA is the concentration of “A” in the inlet stream, CB is the concentration of “B” in the exit stream, T is the temperature of the inlet stream, F is the flow rate in and out of the reactor, V is the reactor volume, Ti is the temperature of exit stream, Tj is the temperature of the cooling fluid in the jacket, H is the heat of reaction, U is the overall heat transfer coefficient, A is the area through which heat transfers from the reactor to the cooling jacket, and and cp are the density and heat capacity of the reactor contents and of all streams. Assuming a stoichiometric proportion of compounds “A” and “B” in the feed, one can assume that CB (t) = 2CA (t). The outlet temperature (T) and the concentration (CA ) are controlled using proportional integral (PI) controllers by manipulating the inlet coolant flow rate (FC ) and the feed flow rate (F), respectively. The parameters of the PI controllers are as follows: KC1 = −0.8 and 1 = 0.1 for the temperature controller, and KC2 = 2 and 1 = 0.1 for the concentration controller. 4.3.2. Generation of dynamic data In a practical setting, the data would be collected by changing the feed flow rate (which is chosen in this example to be the model input, i.e., Q), and then measuring the state variables, i.e., the concentration and temperature as functions of time. Thus, the data are generated given some pre-defined model parameters. The CSTR model parameters as well as other physical properties are shown in Table 2. The simulated CSTR is used to generate training and testing data sets by changing the set points of the concentration and temperature controllers in step-wise fashions. To better represent practical process measurements, the two output variables (concentration and temperature) which are assumed to be noise-free are then


G Model

ARTICLE IN PRESS



11

10

Q−statistic

IRkFPCA based Q statistic Q statistic threshold (Q )

0

50

100

150

200


350

400

450

500

(a) Q statistic in the presence of single fault.

IRkFPCA based T statistic T

statistic threshold (T )

Hotelling’s T 2−statistic

10

10

10

0

50

100

150

200


350

400

450

500

(b) Hotelling’s T 2 -statistic in the presence of single fault. Fig. 9. The time evolution of the Q and T2 statistics on a semi-logarithmic scale in the presence of a single fault in X1 .

contaminated with zero-mean Gaussian noise having standard deviations of c = 0.005 and T = 0.02, respectively. The process data used in training includes four variables, the coolant flow rate (FC ), the feed flow rate (F), the outlet concentration CA , and the reactor outlet temperature T. Thus, the data

matrix, which has 1000 rows and 4 columns, is used to construct the IRkFPCA model after scaling the variables. Next, the performance of the developed IRkFPCA fault detection method is illustrated through its two charts Q, T2 . The fault detection performance is assessed through three different cases

10 Xtst1

Testing faulty data

5

0 0

50

100

150

200

250 Sample Number

300

350

400

450

500

15 Testing faulty data

Xtst2

10 5 0 −5 0

50

100

150

200

250 Sample Number

300

350

400

450

500

6 Xtst3

Testing faulty data

4 2 0 0

50

100

150

200

250 Sample Number

300

350

400

450

500

4 Xtst4

Testing faulty data

2 0 −2 0

50

100

150

200

250 Sample Number

300

350

400

450

500

Fig. 10. The time evolution of the testing faulty data in the presence of a double faults in X1 .


G Model

ARTICLE IN PRESS



12

IRkFPCA based Q statistic Q statistic threshold (Qα)

1

Q−statistic

10

0

10

0

50

100

150

200

250 Sample Number

300

350

400

450

500

(a) Q statistic in the presence of double faults. 2

10

2

IRkFPCA based T statistic 2 2 T statistic threshold (Tα)

2

Hotelling’s T −statistic

1

10

0

10

−1

10

0

50

100

150

200


350

400

450

500

(b) Hotelling’s T 2 -statistic in the presence of double faults. Fig. 11. The time evolution of the Q and T2 statistics on a semi-logarithmic scale in the presence of a double faults in X1 .

10 Xtst1

Testing faulty data

5

0 0

50

100

150

200

250 Sample Number

300

350

400

450

500

15 Testing faulty data

Xtst2

10 5 0 −5 0

50

100

150

200

250 Sample Number

300

350

400

450

500

15 Xtst3

Testing faulty data

10 5 0

0

50

100

150

200

250 Sample Number

300

350

400

450

500

20 Xtst4

Testing faulty data

10 0

−10 0

50

100

150

200

250 Sample Number

300

350

400

450

500

Fig. 12. The time evolution of the testing faulty data in the presence of a multiple faults in X1 .


G Model

ARTICLE IN PRESS



13 IRkFPCA based Q statistic Q statistic threshold (Q ) α

1

Q−statistic

10

0

10

0

50

100

150

200

250 Sample Number

300

350

400

450

500

(a) Q statistic in the presence of multiple faults. 2

10

2

IRkFPCA based T statistic 2 2 T statistic threshold (T ) α

Hotelling’s T 2−statistic

1

10

0

10

−1

10

0

50

100

150

200

250 Sample Number

300

350

400

450

500

(b) Hotelling’s T 2 -statistic in the presence of multiple faults. Fig. 13. The time evolution of the Q and T2 statistics on a semi-logarithmic scale in the presence of a multiple faults in X1 .

studies representing three different types of faults. In the first case study, the sensor measuring the concentration of A (CA ) is assumed to be faulty with single as well as multiple faults. In the second case study, a similar faults (single and multiple) are introduced in the temperature of the reactor (T). In third case study, multiple faults are assumed to occur simultaneously in the concentration and temperature inside the reactor. 4.3.3. Case 1: Faults in the concentration CA The testing data used to evaluate the fault detection performances, which consist of 500 samples, are generated using the CSTR model described earlier. To simulate a single fault in the state variable CA , an additive fault having a magnitude 20% of the total variation in CA is introduced between samples 100 and 150. The Table 2 CSTR model parameters and physical properties. Parameter

Value

Parameter

Value

E (J/mol) − H (J/mol) k0 (l/min mol) CAi (mol/l) Ti (K) R (J/mol K)

76,534 596,619 4.11 × 1013 1 350 8.31451

V (l)

(g/l) cp (J/g K) Tj (K) UA (W K)

100 1000 4.2 250 5 × 104

results using the IRkFPCA-based Q and IRkFPCA-based T2 techniques (as shown in Fig. 14(a) and (b)) show that they could successfully detect this single fault. In the presence of a multiple faults in CA , we can show from Fig. 15(a) and (b) the results of the process monitoring of CSTR process using the IRkFPCA-based T2 and IRkFPCA-based Q techniques. Both statistics T2 and Q arise their thresholds when the faults occur but with some false alarms when using the T2 statistic (see Fig. 15(b)) and missed detections when using the Q statistic (see Fig. 15(a)).

4.3.4. Case 2: Faults in the temperature T In this case study, the sensor measuring the temperature T is assumed to be faulty with single as well as multiple faults. First, a single fault in the reactor temperature represented by a constant bias of amplitude equal 5% of the total variation in T is introduced between the sample numbers 100 and 150. Fig. 16(a) and (b) shows the ability of the IRkFPCA-based Q and IRkFPCA-based T2 methods to detect this additive fault. Here, we consider a multiple faults in the sensor measuring the temperature T at the intervals [100 . . . 150], [250 . . . 350] and [400 . . . 450]. Fig. 17(a) and (b) shows the fault detection results using the IRkFPCA-based Q and IRkFPCA-based T2 techniques. The


G Model

ARTICLE IN PRESS



14 2

3

10

10

IRFPCA based Q statistic Q statistic threshold (Q )

IRFPCA based T statistic T statistic threshold (T )

2

Hotelling’s T2−statistic

10 1

Q−statistic

10

0

10

1

10

0

10

−1

10

−1

−2

10 0

50

100

150

200


350

400

450

10 0

500

50

100

150

200


350

400

450

500

(b) Hotelling’s T 2 -statistic in the presence of a single fault in C A .

(a) Q statistic in the presence of a single fault in C A .

Fig. 14. The time evolution of the Q and T2 statistics on a semi-logarithmic scale in the presence of a single fault in CA . 2

3

10

10


1

10

0

2

Hotelling’s T2 −statistic

Q−statistic

10


−1

10

−2

10

−3

10

−4

10

−5

10

10

1

10

0

10

−6

10

−7

10 0

−1

50

100

150

200


350

400

450

10 0

500

50

100

150

200


350

400

450

500

(b) Hotelling’s T 2 -statistic in the presence of a multiple faults in C A .

(a) Q statistic in the presence of a multiple faults in C A .

Fig. 15. The time evolution of the Q and T2 statistics on a semi-logarithmic scale in the presence of a multiple faults in CA . 10 2 IRFPCA based Q statistic Q statistic threshold (Q )


10 1

1

10

2

Q−statistic

Hotelling’s T −statistic

10 0

100

10 −1 10 −2 10 −3 10 −4 10 −5 −6

0

50

100

150

200


350

400

450

10 0

500

50

100

150

200


350

400

450

500

(b) Hotelling’s T 2 -statistic in the presence of a single fault in T .

(a) Q statistic in the presence of a single fault in T .

Fig. 16. The time evolution of the Q and T2 statistics on a semi-logarithmic scale in the presence of a single fault in T. 3

3

10

10



2

10

2

Hotelling’s T2−statistic

Q statistic

10 1

10

0

10

−1

1

10

0

10

10

−2

10 0

−1

50

100

150

200


350

400

450

(a) Q statistic in the presence of a multiple faults in T .

500

10 0

50

100

150

200


350

400

450

500

(b) Hotelling’s T 2 -statistic in the presence of a multiple faults in T .

Fig. 17. The time evolution of the Q and T2 statistics on a semi-logarithmic scale in the presence of a multiple faults in T.


G Model

ARTICLE IN PRESS


R. Baklouti et al. / Journal of Computational Science xxx (2015) xxx–xxx 2

4

10

10


10 Hotelling’s T 2−statistic

10

2

Q−statistic


1

3

10

1

10

0

10

0

10

−1

10

−2

10

−3

−1

10

10

−4

−2

10 0

15

50

100

150

200


350

400

450

500

(a) Q statistic in the presence of simultaneous faults in C A and T

10 0

50

100

150

200


350

400

450

500

(b) Hotelling’s T 2 -statistic in the presence of simultaneous faults in C A and T .

Fig. 18. The time evolution of the Q and T2 statistics on a semi-logarithmic scale in the presence of simultaneous faults in CA and T.

IRkFPCA-based Q and IRkFPCA-based T2 techniques are able to detect the multiple faults (see Fig. 17(a) and (b)).

4.3.5. Case 3: Faults in the concentration CA and temperature T In this case study, simultaneous faults are introduced in both the concentration and temperature (each of which is represented by a bias of magnitude equal 20% of the variation in its corresponding variable). The results using the IRkFPCA-based Q, IRkFPCA-based T2 and IRkFPCA techniques for these multiple faults are shown in Fig. 18(a) and (b).

data set and GLRT is used to detect the faults and both are utilized to improve faults detection process. Acknowledgement This work was made possible by NPRP grant NPRP7-1172-2-439 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors. References

5. Conclusion In this paper, we developed an Iterated Robust kernel Fuzzy Principal Component Analysis (IRkFPCA), which is the method that combines the advantages of the state of art methods and use a more accurate multi-objective function for jointly reducing the modeling errors, optimizing the robustness to outliers and reducing the memory efficiency since it does not require the storage and inversion of the covariance matrix to obtain a memory-efficient approximation of kernel PCA. The developed IRkFPCA method computes iteratively the principal components, which are used for modeling and fault detection tasks. The fault detection task is related to the computation of the detection charts, which are used to detect the faults. The fault detection performance of IRkFPCA method is illustrated through two simulated examples, one using synthetic data and the other using simulated continuously stirred tank reactor (CSTR) data. The results of the comparative studies reveal that the developed IRkFPCA method provides a better performance in terms of modeling and fault detection accuracies than the Iterated Robust Fuzzy Principal Component Analysis (IRFPCA) and Iterated kernel Principal Component Analysis (IkPCA) methods; while both methods provide improved accuracy over the Iterated Principal Component Analysis (IPCA) method. The benefit of the IRkFPCA method lies in its ability to use many factors that may influence the relevance of the modeling and monitoring in process operations. We have identified several directions for extending this research. First, we propose to improve this work, using a Pareto multi-objective optimization to find good compromises (or tradeoffs) for model selection. The multi-objective optimization method optimizes the model selection scheme and estimates the execution times. Second, we propose to use a generalized likelihood ratio test (GLRT) for faults detection. GLRT has been proven to be an effective fault detection approach in the presence of process models [79,80]. Hence, the KPCA is used to create the model and find nonlinear combinations of parameters which describe the major trends in a

[1] M. Al-Maslamani, Assessment of atmospheric emissions due to anthropogenic activities in the State of Qatar (Ph.D. thesis), Institute for the Environment – Brunel University, 2011. [2] M. Poumadere, C. Mays, S.L. Mer, R. Blong, The 2003 HeatWave in France: Dangerous climate change here and now, Risk Anal. 25 (6) (2005) 1483–1494. [3] V. Venkatasubramanian, R. Rengaswamy, S. Kavuri, K. Yin, A review of process fault detection and diagnosis. Part I: quantitative model-based methods, Comput. Chem. Eng. 27 (2003) 293–311. [4] Y. Qingsong, Model-based and data driven fault diagnosis methods with applications to process monitoring (PhD. thesis), Case Western reserve University, 2004, May. [5] V. Venkatasubramanian, R. Rengaswamy, S. Kavuri, K. Yin, A review of process fault detection and diagnosis. Part III: process history based methods, Comput. Chem. Eng. 27 (2003) 327–346. [6] K. Chaitanya, Data-based modeling: application in process identification, monitoring and fault detection (PhD. thesis), 2011. [7] S. Kim, S. Ahn, J. Chung, I. Hwang, S. Kim, M. No, S. Sin, A rule based approach to network fault and security diagnosis with agent collaboration, in: Lecture Notes in Computer Science, 2004. [8] N. Wilcox, D. Himmelblau, The possible cause and effect graphs (PCEG) model for fault diagnosis. I. Methodology, Comput. Chem. Eng. 18 (2) (1994) 103–116. [9] R. Wirth, B. Berthold, A. Kramer, G. Peter, Knowledge-based support of system analysis for the analysis of failure modes and effects, Eng. Appl. Artif. Intell. 9 (3) (1996) 219–229. [10] V. Venkatasubramanian, R. Rengaswamy, S. Kavuri, K. Yin, A review of process fault detection and diagnosis. Part II: qualitative models and search strategies, Comput. Chem. Eng. 27 (2003) 313–326. [11] T.T.V. Sylvain, K. Abdessamad, Fault detection with Bayesian network, in: A. Zemliak (Ed.), Frontiers in Robotics, Automation and Control, Intech, 2008. [12] M. Kinnaert, Fault diagnosis based on analytical models for linear and nonlinear systems – a tutorial, in: Proceedings of the 15th International Workshop on Principles of Diagnosis, 2003, pp. 37–50. [13] M. Nyberg, C.M. Nyberg, Model based fault diagnosis: methods, theory, and automotive engine applications (PhD. thesis), 1999. [14] R. Clark, D. Fosth, V. Walton, Detecting instrument malfunctions in control systems, IEEE Trans. Aerosp. Electron. Syst. vol. AES-11 (4) (1975) 465–473. [15] R.J. Patton, P. Frank, R. Clarke (Eds.), Fault Diagnosis in Dynamic Systems: Theory and Application, Prentice-Hall, Inc., 1989. [16] A. Xu, Observateurs adaptatifs non-linéaires et diagnostic de pannes (Ph.D. dissertation), Université de Rennes 1, 2002. [17] M. Staroswiecki, Redondance analytique, in: Automatique et statistiques pour le diagnostic, Hermes Science Europe, 2001. [18] X. Ding, P. Frank, Fault detection via factorization approach, Syst. Control Lett. 14 (5) (1990) 431–436. [19] R.J. Patton, J. Chen, A review of parity space approaches to fault diagnosis, in: Proceedings of SAFEPROCESS’91, 1991, pp. 239–255.


G Model JOCS-432; No. of Pages 16 16


[20] E. Chow, A. Willsky, Analytical redundancy and the design of robust failure detection systems, IEEE Trans. Autom. Control 29 (July (7)) (1984) 603–614. [21] J. Ragot, D. Maquin, F. Kratz, Analytical redundancy for systems with unknown inputs – application to fault detection, Control Theory Adv. Technol. 9 (3) (1993) 775–788. [22] O. Adrot, Diagnostic à base de modéles incertains utilisant l’analyse par intervalles: l’approche bornante (Ph.D. dissertation), Institut National Polytechnique de Lorraine, 2000. [23] O. Adrot, S. Ploix, J. Ragot, Caractérisation des incertitudes dans un modéle linéaire statique, J. Eur. Syst. Autom. 36 (6) (2002) 799–824. [24] K. Benothman, D. Maquin, J. Ragot, M. Benrejeb, Diagnosis of uncertain linear systems: an interval approach, Int. J. Sci. Tech. Autom. Control Comput. Eng. 1 (2) (2007) 136–154. [25] F. Harrou, M. Nounou, H. Nounou, Statistical detection of abnormal ozone levels using principal component analysis, Int. J. Eng. Technol. 12 (6) (2012) 54–59. [26] L. Chaing, E. Russel, R. Braatz, Fault Detection and Diagnosis in Industrial Systems, Springer, London, 2001. [27] P. Subbaraj, B. Kannapiran, Artificial neural network approach for fault detection in pneumatic valve in cooler water spray system, Int. J. Comput. Appl. 9 (7) (2010) 43–52. [28] A. Dexter, M. Benouarets, Generic approach to identifying faults in HVAC plants, ASHRAE Trans. 102 (1) (1996) 550–556. [29] K. Mohammadi, R. Asgary, Pattern recognition and fault detection in MEMS, Comput. Recogn. Syst. Adv. Soft Comput. 30 (2005) 877–884. [30] D. Dehestani, F. Eftekhari, Y. Guo, S. Ling, S. Su, H. Nguyen, Online support vector machine application for model based fault detection and isolation of HVAC system, Int. J. Mach. Learn. Comput. 1 (1) (2011) 66–72. [31] C. Batur, L. Zhou, C.-C. Chan, Support vector machines for fault detection, in: Proceedings of the IEEE Conference on Decision and Control, Las Vegas, NV, 2002, pp. 1355–1356. [32] M. Harkat, G. Mourot, J. Ragot, An improved PCA scheme for sensor FDI: application to an air quality monitoring network, J. Process Control 16 (6) (2006) 625–634. [33] B. Wise, N. Gallagher, The process chemometrics approach to process monitoring and fault detection, J. Process Control 6 (1996) 329–348. [34] A. Simoglou, E. Martin, A. Morris, Multivariate statistical process control in chemicals manufacturing, in: IFAC Conference SAFEPROCESS, Hull, UK, 1997, pp. 21–27. [35] J. George, Z. Chen, P. Shaw, Fault detection of drinking water treatment process using PCA and Hotelling’s T2 chart, World Acad. Sci. Eng. Technol. 50 (2009) 970–975. [36] Y. Tharrault, Diagnostic de fonctionnement par analyse en composantes principales: Application à une station de traitement des eaux usées (Ph.D. dissertation), National Polytechnic Institute of Lorraine, 2008. [37] C. Nascimento, J. Martins, Pharmacophoric profile: design of new potential drugs with PCA analysis, in: P. Sanguansat (Ed.), Principal Component Analysis – Multidisciplinary Applications, Intech, 2012. [38] F. Reverter, E. Vegas, J. Oller, Kernel methods for dimensionality reduction applied to the omics data, in: P. Sanguansat (Ed.), Principal Component Analysis – Multidisciplinary Applications, Intech, 2012. [39] D. Magyar, G. Oros, Application of the principal component analysis to disclose factors influencing on the composition of fungal consortia deteriorating remained fruit stalks on sour cherry trees, in: P. Sanguansat (Ed.), Principal Component Analysis – Multidisciplinary Applications, Intech, 2012. [40] E. Belasco, B. Philips, G. Gong, The health care access index as a determinant of delayed cancer detection through principal component analysis, in: P. Sanguansat (Ed.), Principal Component Analysis – Multidisciplinary Applications, Intech, 2012. [41] J. Yu, Fault detection using principal components-based Gaussian mixture model for semiconductor manufacturing processes, IEEE Trans. Semicond. Manuf. 24 (3) (2011) 471–486. [42] J. MacGregor, T. Kourti, Statistical process control of multivariate processes, Control Eng. Pract. 3 (3) (1995) 403–414. [43] Y. Tharrault, G. Mourot, J. Ragot, D. Maquin, Fault detection and isolation with robust principal component analysis, Int. J. Appl. Math. Comput. Sci. 18 (4) (2008) 429–442. [44] M.-F. Harkat, G. Mourot, J. Ragot, An improved PCA scheme for sensor FDI: application to an air quality monitoring network, J. Process Control 16 (6) (2006) 625–634. [45] S.J. Qin, R. Dunia, Determining the number of principal components for best reconstruction, J. Process Control 10 (2) (2000) 245–250. [46] M. Tamura, S. Tsujita, A study on the number of principal components and sensitivity of fault detection using PCA, Comput. Chem. Eng. 31 (9) (2007) 1035–1046. [47] A. Benaicha, M. Guerfel, N. Bouguila, K. Benothman, New PCA-based methodology for sensor fault detection and localization, in: 8th International Conference of Modeling and Simulation, MOSIM, 2010, vol. 10.

[48] Z. David, B. Marta, From large chemical plant data to fault diagnosis integrated to decentralized fault-tolerant control: pulp mill process application, Ind. Eng. Chem. Res. 47 (4) (2008) 1201–1220. [49] J. Chen, J.A. Bandoni, J.A. Romagnoli, Robust PCA and normal region in multivariate statistical process monitoring, AIChE J. 42 (12) (1996) 3563–3566. [50] C. Croux, G. Haesbroeck, Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies, Biometrika 87 (3) (2000) 603–618. [51] J.-M. Lee, C. Yoo, S.W. Choi, P.A. Vanrolleghem, I.-B. Lee, Nonlinear process monitoring using kernel principal component analysis, Chem. Eng. Sci. 59 (1) (2004) 223–234. [52] S. Choi, J.L.C.K. Lee, J. Park, I.B. Lee, Fault detection and identification of nonlinear processes based on kernel PCA, Chemometr. Intell. Lab. Syst. 75 (1) (2005) 55–67. [53] V.H. Nguyen, J.-C. Golinval, Fault detection based on kernel principal component analysis, Eng. Struct. 32 (11) (2010) 3683–3691. [54] B. Raoudha, M. Majdi, N. Mohamed, N. Hazem, B.H. Ahmed, Iterated robust fuzzy principal component analysis, 2015 (under review). [55] K.I. Kim, M.O. Franz, B. Scholkopf, Iterative kernel principal component analysis for image modeling, IEEE Trans. Pattern Anal. Mach. Intell. 27 (9) (2005) 1351–1366. [56] S. Günter, N. Schraudolph, S. Vishwanathan, Fast iterative kernel principal component analysis, J. Mach. Learn. Res. 8 (2007) 1893–1918. [57] J. Jackson, G. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (1979) 341–349. [58] M. Zhu, A. Ghodsi, Automatic dimensionality selection from the scree plot via the use of profile likelihood, Comput. Stat. Data Anal. 51 (2006) 918–930. [59] G. Diana, C. Tommasi, Cross-validation methods in principal component analysis: a comparison, Stat. Methods Appl. 11 (1) (2002) 71–82. [60] I. Jolliffe, Principal Component Analysis, second ed., Springer, Berlin, 2002. [61] E. Oja, Principal components, minor components, and linear neural networks, Neural Netw. 5 (6) (1992) 927–935. [62] S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004. [63] G. Heo, P. Gader, H. Frigui, RKF-PCA: robust kernel fuzzy PCA, Neural Netw. 22 (5) (2009) 642–650. [64] P. Teppola, S.-P. Mujunen, P. Minkkinen, Adaptive fuzzy c-means clustering in process monitoring, Chemometr. Intell. Lab. Syst. 45 (1) (1999) 23–38. [65] P. Luukka, A new nonlinear fuzzy robust PCA algorithm and similarity classifier in classification of medical data sets, Int. J. Fuzzy Syst. 13 (3) (2011) 153–162. [66] L. Xu, A.L. Yuille, Robust principal component analysis by self-organizing rules based on statistical physics approach, IEEE Trans. Neural Netw. 6 (1) (1995) 131–143. [67] G. Mussardo, Statistical Field Theory, Oxford Univ. Press, 2010. [68] E. Oja, The Nonlinear PCA Learning Rule and Signal Separation: Mathematical Analysis, Citeseer, 1995. [69] T.D. Sanger, Optimal unsupervised learning in a single-layer linear feedforward neural network, Neural Netw. 2 (6) (1989) 459–473. [70] C. Darken, J. Moody, Towards faster stochastic gradient search, in: NIPs, 1991, pp. 1009–1016. [71] H. Hotelling, Analysis of a complex of statistical variables into principal components, J. Educ. Psychol. 24 (1933) 417–441. [72] E.L. Russell, L.H. Chiang, R.D. Braatz, Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis, Chemometr. Intell. Lab. Syst. 51 (1) (2000) 81–93. [73] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (3) (1979) 341–349. [74] S. Qin, Statistical process monitoring: basics and beyond, J. Chemometr. 17 (8/9) (2003) 480–502. [75] A. Benaicha, M. Guerfel, N. Boughila, K. Benothman, New PCA-based methodology for sensor fault detection and localization, in: MOSIM’10, Hammamet – Tunisia, 2010 May 10-12. [76] S. Valle, W. Li, S.J. Qin, Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods, Ind. Eng. Chem. Res. 38 (11) (1999) 4389–4401. [77] A. Vajesta, R. Schmitz, An experimental study of steady-state multiplicity and stability in an adiabatic stirred reactor, AIChE J. 3 (1970) 410–419. [78] M.M. Mansouri, H.N. Nounou, M.N. Nounou, A.A. Datta, State and parameter estimation for nonlinear biological phenomena modeled by s-systems, Digit. Signal Process. 28 (2014) 1–17. [79] T. Severini, Likelihood Methods in Statistics, Oxford University Press, Oxford, 2000. [80] Y. Pawitan, In All Likelihood: Statistical Modeling and Inference using Likelihood, Oxford University Press, Oxford, 2001.


Iterated Robust kernel Fuzzy Principal Component ...

Iterated Robust kernel Fuzzy Principal Component ...

Suggest Documents

An iterative algorithm for robust kernel principal component analysis ...

Fuzzy Principal Component Analysis for Fuzzy Data1

ROBUST PRINCIPAL COMPONENT FUNCTIONAL LOGISTIC

New Adaptive Kernel Principal Component ... - Semantic Scholar

Application of kernel principal component analysis

hyperparameter selection in kernel principal component analysis

Kernel Principal Component Regression with EM ...

Kernel Hebbian Algorithm for Iterative Kernel Principal Component

Robust Principal Component Analysis Using Statistical Estimators

Mixtures of Robust Probabilistic Principal Component Analyzers

Approximated Robust Principal Component Analysis for Improved

An Efficient Bayesian Robust Principal Component Regression

Robust Principal Component Analysis for ... - Semantic Scholar

FRPCA: Fast Robust Principal Component Analysis - Microsoft

Robust Principal Component Analysis on Graphs

Robust Principal Component Analysis on Graphs - arXiv

Principal Component Regression by Principal Component Selection

Kernel Principal Component Analysis and the Construction of Non ...

Object Categorization Based on Kernel Principal Component Analysis ...

New Energy Empowerment Using Kernel Principal Component ... - MDPI

Image Denoising with a Multi-Phase Kernel Principal Component

Statistical Properties of Kernel Principal Component Analysis 1

Application of Kernel Principal Component Analysis for ... - KU Leuven

Kernel Principal Component Analysis of Coil Compression in Parallel