A Generalized Variance of Reconstruction Error Criterion for

4 downloads 0 Views 180KB Size Report
A Generalized Variance of Reconstruction Error Criterion for. Determining the Optimum Number of Principal Components. Baligh Mnassri*, El Mostafa El Adel*, ...
18th Mediterranean Conference on Control & Automation Congress Palace Hotel, Marrakech, Morocco June 23-25, 2010

A Generalized Variance of Reconstruction Error Criterion for Determining the Optimum Number of Principal Components Baligh Mnassri*, El Mostafa El Adel*, Bouchra Ananou*, Mustapha Ouladsine* Abstract— One of the main difficulties in using Principal Component Analysis (PCA) is the selection of the number of Principal Components (PCs) that constitute the optimal PCAmodel. A well-defined Variance of Reconstruction Error (VRE) is proposed to determine this number. It finds the model which corresponds to the best reconstruction of variables by minimization of the reconstructed Squared Prediction Error (SPE) index. This classical VRE criterion behaves well when all variables are correlated because it determines the number of redundancies in data without taking into account the uncorrelated variables. However to represent the system in an optimal way, the number of PCs constituting the PCA-model must be equal to the sum of the number of redundancies and the number of the independent variables in data. It is well known that the reconstruction task depends on the used detection index. Consequently, the best reconstruction is not unique and can be obtained differently. In this paper, we generalize the VRE criterion to any quadratic detection index. We show that the optimum number of PCs can be chosen by minimizing the VRE of the combined index.

I. INTRODUCTION The use of principal components analysis (PCA), in many different industrials processes monitoring and analysis, has been intensively studied. PCA is a proven technique in multivariate statistical analysis and it has been more recently proposed as a technique in multivariate statistical process control for on-line detection and isolation of faults [1], [3]. For the purpose of detection, PCA is used to generate an empirical static and linear model based on normal operating data. Moreover, the PCA model is built using the measured data during normal operation conditions. A key issue in developing a PCA model is to choose the adequate number of principal components (PCs) to represent the system in an optimal way. Although the PCA method has been widely used, selecting the optimum number of PCs is rather subjective and is not unique. There are two disadvantages which can occur with a bad choice. If a fewer PCs are selected than required, a poor model will be obtained and an incomplete representation of the process results. On the contrary, if a more PCs than necessary are retained, the model will be over-parameterized and will include noise [4]. To avoid these problems, several criteria for selecting the optimum number of PCs were proposed in the literature [4] such as the scree plot, eigenvalues *The authors are with Laboratoire des Sciences de l’Information et des Syst`emes (LSIS), Universit´e Paul C´ezanne (Aix-Marseille III), Avenue Escadrille Normandie-Ni´emen, 13397 Marseille Cedex 20, France.

[email protected] [email protected] [email protected] [email protected]

978-1-4244-8092-0/10/$26.00 ©2010 IEEE

limits, cumulative percent variance, cross-validation method, variance of reconstruction error (VRE) [2], [4]. The scree plot and eigenvalues limits approaches consider that components with small eigenvalues are not important for modelling. The cumulative percent variance approach selected the model which has a minimum dimension and can express a substantial part of the total variance of data. The cross-validation method uses part of the training samples for model construction. The remaining samples are compared with the predicted ones using the model. When the prediction residual sum of squares becomes smaller than the residual sum of squares of the previous model, the new component is added to the model. Most of these existing approaches use an index that is monotonically decreasing. There may be rather constant decrement in the index and there can be more than one location which satisfies the criterion. This is why they are called subjective methods. The last criterion is the variance of reconstruction error which is based on the best reconstruction of the variables. An important feature of this approach is that it has a minimum corresponding to the best reconstruction. When the PCA-model is used to reconstruct missing values, the reconstruction error is a function of the number of PCs. As a result, the VRE always has a minimum which points to the optimal number of PCs for best reconstruction [3], [5]. In PCA-based process monitoring, several indices have been widely used for fault detection [2], [9]. The Hotelling’s T 2 and Hawkins’s TH2 (or Squared Weighted Error: SWE) statistics give a measure of variation with the PCA model and are expressed in the Principal Components Subspace (PCS). The Squared Prediction Error (SPE), calculated into the Residual Subspace (RS), permits to indicate how much each sample deviates from the model. The combined index uses simultaneously the T 2 and SPE statistics which are weighted by their control limits respectively. It is well noted that these indices are all quadratic functions of the studied sample’s vector [3]. In this paper, it will be shown that the existing VRE criterion is calculated by minimizing only the reconstructed SPE index. As result, the retained number of PCs is not always optimum. It represents the number of redundancies in data because the uncorrelated variables are rejected by this criterion. Recently, it has been shown that the reconstruction of the missing values depends on each detection index [6]. Consequently, the VRE criterion is not unique and can be calculated differently. This allowed us to propose a generalized VRE formula which depends on each studied index. In

868

the first step of our work, we improve the classical criterion related to the SPE index. Secondly, we propose such VRE which incorporates the correlated and uncorrelated variables in a balanced way. This criterion minimizes the variance of the reconstruction error based on the combined index. An important feature of this approach is that it has a minimum and it depends on a significance level. The organization of the paper is as follows: section 2 presents an overview of the linear PCA approach, fault detection indices (SPE, SWE, T 2 and combined index) and their control limits. Section 3 deals with fault reconstruction method as well as our proposed generalized VRE criterion. The proposed methods are applied to simulations examples in section 4. Section 5 concludes the paper. II. FAULT MODELLING AND DETECTION INDICES

A. Linear PCA Let x ∈ ℜm denotes a sample vector of m variables. Assuming that there are N samples for each variable, a data matrix X ∈ ℜN×m is composed with each row representing a sample. To avoid the scaling problem, we consider that the data are scaled to zero mean and to unit variance. PCA determines an optimal linear transformation of the data X [9], [10] which can be decomposed into a score matrix T and a loading matrix P as follows:

with PPT = PT P = Im

T = [Tˆℓ | T˜m−ℓ ] = [Tˆ | T˜ ] (3)

where ℓ represents the number of the more significant PCs which are sufficient to explain the variability of the process through their data X. The first ℓ eigenvectors constitute the ˆ representation space or the PCS defined by S p = span{P}, ˜ whereas the RS is: Sr = span{P}. A sample vector x can be projected on the PCS and RS respectively: ˆ ∈ Sp xˆ = Pˆ Pˆ T x = Cx

xˆ T x˜ = x˜ T xˆ = 0

and x = xˆ + x˜

(6)

The vectors xˆ and x˜ represent respectively the modelled and unmodelled variations of x based on ℓ components. The task for determining the number of PCs is to choose ℓ such that xˆ contains mostly information and x˜ contains noise. This number will be defined bellow in the following sections.

Fault detection is usually the first step in MSPC. Many typical statistics for detecting abnormal conditions have been proposed as SPE (or Q-statistic) and Hotelling’s T 2 (or Dstatistic) which represent the variability in the RS and PCS respectively. The remaining indices are extended forms. All these detection indices are quadratic functions which are expressed with an unified form:

γ (k) = xT (k)ϒγ x(k)

(7)

where ϒγ is a characteristic positive-definite matrix of the studied index γ . 1) Hotelling’s T 2 index: Hotelling’s T 2 statistic measures variations in the PCS. Its expression and characteristic matrix are respectively: ˆ −1 Pˆ T x(k) T 2 (k) = xT (k)Pˆ Λ

(8)

ˆ −1 Pˆ T ϒT 2 = Pˆ Λ

(9)

When the process uppers a normal condition and the data follow a multivariate normal distribution, the T 2 index can be well approximated by a Chi2 distribution with ℓ degrees of freedom and a significance level or quantile α [7]. The process operation is considered normal if:

(2)

Λ = diag (λ1 , λa · · · λm ) is a diagonal eigenvalues matrix with elements in the decreasing order. The partition of eigenvectors and principal components matrices gives respectively: ˜ P = [Pˆℓ | P˜m−ℓ ] = [Pˆ | P],

ˆ represent respectively the projection where Cˆ and C˜ = (I − C) matrices on the PCS and RS with ℓ < m. With an optimum number of PCs, the diagonal elements of these two matrices should indicate correctly the linear correlation for each variable. Since S p and Sr are orthogonal,

(1)

with T = [t1 ta · · ·tm ] ∈ ℜN×m and a = {1, · · · , m}, where the vectors ta are called scores or PCs. The matrix P = [p1 pa · · · pm ] ∈ ℜm×m , where the orthogonal vectors pa called loading or principal vectors, are the eigenvectors associated to the eigenvalues λa of the correlation matrix Σ of X such that: Σ = PΛPT

(5)

B. Fault Detection Indices

Statistical process monitoring relies on the use of normal process data to build process models. PCA is one of the most popular statistical methods for extracting information from measured data. It finds the directions of significant variability in data by forming linear combinations of variables. PCA models are predominantly used to extract variables correlation.

X = T PT

˜ ∈ Sr x˜ = P˜ P˜ T x = Cx

T 2 (k) ≤ Tα2 = χℓ,2 α

(10)

2) Hawkins’s TH2 or SWE index: Hawkins’s TH2 statistic is a symmetric implementation of Hotelling’s T 2 index in the RS. The formula of this statistic and its characteristic matrix are respectively: ˜ −1 P˜ T x(k) TH2 (k) = xT (k)P˜ Λ

(11)

˜ −1 P˜ T ϒT 2 = P˜ Λ

(12)

H

Similarly, the process is considered normal if:

(4)

869

2 TH2 (k) ≤ Hα2 = χm−ℓ, α

(13)

3) SPE index: A change in variable correlation indicates an unusual situation because the variables do not conserve their normal relations. Under this situation, the sample x increases its projection into the RS. SPE is the magnitude of x˜ and it is expressed as follows: ˜ SPE(k) = xT (k)Cx(k)

(14)

˜T

ϒSPE = P˜ P = C˜

(15)

The process is considered normal if the SPE statistic is below its control limit as shown below [7], [8]: SPE(k) ≤ δα2 = gSPE χh2

gSPE = tr[(ΣϒSPE )2 ]/tr[ΣϒSPE ] 2

hSPE = (tr[ΣϒSPE ]) /tr[(ΣϒSPE ) ]

0]

(17) (18)

xR = x − ΞR fR

where Σ is the correlation matrix of the data X, tr[∗]: implies the trace of the matrix ∗. 4) Combined index: In practice, only one index rather than two is preferred to monitor the process. Since the SPE and T 2 statistics behave in a complimentary manner, it is possible to combine them to simplify the fault detection task [5]: SPE(k) T 2 (k) ϕ (k) = + = xT (k)ϒϕ x(k) (19) δα2 Tα2 ˆ −1 Pˆ T P˜ P˜ T Pˆ Λ + δα2 Tα2

0 0

Fault reconstruction approach depends on the used detection index γ . The objective is to estimate the magnitude of faults along a given direction. Let xR is the reconstructed vector of x along the Rth direction (variable):

with hSPE degrees of freedom. The two parameters, gSPE and hSPE , are determined respectively by [7]:

ϒϕ =

ΞTR = [0 1

(16)

SPE ,α

2

subset containing the indices of the fault directions. k f k represents the magnitude of the fault. Note that f may change depending on how the actual fault develops over time. The orthonormal matrix ΞR ∈ ℜm×r indicates the reconstruction directions. This matrix is built with 0 and 1, where 1 indicates the reconstructed variables, and 0 otherwise. For unidimensional faults, ΞR is a column vector. For example, assumed that data are constituted with 5 variables and we choose to reconstruct the second variable one has R = {2} and ΞR = Ξ2 given by:

(20)

The distribution of ϕ can be approximated using gϕ χh2ϕ :

(25)

then the reconstructed detection index γR that corresponds to the reconstructed vector xR is:

γR = xTR ϒγ xR

(26)

The estimated magnitude of the fault is determined by minimizing a given reconstructed detection index as follows:  γ γ γ fR = argmin {γR } = argmin (x − ΞR fR )T ϒγ (x − ΞR fR ) | {z } | {z } γ

γ

fR

fR

(27) The derivation of this expression proves that the estimated magnitude of fault and the corresponding reconstructed vector are respectively: γ

gϕ = tr[(Σϒϕ )2 ]/tr[Σϒϕ ]

(21)

fR = (ΞTR ϒγ ΞR )−1 ΞTR ϒγ x

(28)

hϕ = (tr[Σϒϕ ])2 /tr[(Σϒϕ )2 ]

(22)

xR = (I − ΞR(ΞTR ϒγ ΞR )−1 ΞTR ϒγ )x

(29)

Then for a given confidence level α , the process is considered normal if: ϕ (k) ≤ ζα2 = gϕ χh2ϕ ,α (23) III. FAULT RECONSTRUCTION AND A GENERALIZED VRE CRITERION

B. Generalized VRE Criterion

The reconstruction approach assumes that a group of variables may be faulty and suggests reconstructing the assumed faulty variables using PCA model from the remaining variables. A. Fault Reconstruction Approach The reconstruction of variables consists in estimating the reconstructed vector by eliminating the effect of the faults along directions faults. The sample vector for normal operating conditions is denoted by x∗ and it is unknown when a fault has occurred. In the presence of a multidimensional process faults R, the sample vector x is represented by [2]: x = x∗ + ΞR f

where I ∈ ℜm×m is the identity matrix. From (28) and (29), it is clear that the fault reconstruction approach depends on the used index γ which permits us to establish a generalized expression of VRE criterion.

(24)

If considering the VRE method, when the reconstruction error becomes minimum then the corresponding PCs index is considered as the optimum number for modelling. The generalized variance of reconstruction error which depends on the number of PCs ℓ, the Rth direction faults and the detection index γ is given by:  ΞT ϒγ Σϒγ ΞR γ uR = var ΞTR (x − xR ) = R T (ΞR ϒγ ΞR )2

(30)

The simplified expressions of equation (30) for each detection index mentioned above are as follows: Ξ˜ TR Σ˜ Ξ˜ R Ξ˜ TR ΣΞ˜ R = (31) uSPE = R (Ξ˜ T Ξ˜ )2 (Ξ˜ T Ξ˜ )2

where R represents a subset of r among m variables which are assumed in faulty and should be reconstructed. R is a

870

R

T2

uR =

R

1 ˆ −1 Pˆ T Ξ ΞTR Pˆ Λ R

R

=

R

1 ΞTR ϒT 2 ΞR

(32)

E uSW = R

1 1 = T ˜ −1 P˜ T Ξ Ξ ϒ ΞTR P˜ Λ R T 2 ΞR R

(33)

H

ϕ

uR =

˜R Ξ˜ TR Σ˜ Ξ 2

(Ξ˜ TR Ξ˜ R + Tδα2 ΞTR ϒT 2 ΞR )2

+

α

ΞTR ϒT 2 ΞR 2 ˜ T Ξ˜ + ΞT ϒT 2 Ξ )2 ( Tδα2 Ξ R R R R α

(34) Considering all possible faults R = {1, · · · , m} and for a given index γ , the VRE criterion to be minimized with respect to ℓ is defined as: m

V REγ (ℓ) = |{z} min ℓ

γ

∑ uR

(35)

R=1

Remarks: 2 • From (32), it is clear that uT R is an inversely proportional function to (ΞTR ϒT 2 ΞR ). The characteristic matrix 2 ϒT 2 increases when ℓ increases. Consequently, uTR decreases monotonically if ℓ increases. E is an inversely • From (33), we deduce that also uSW R T proportional function to (ΞR ϒT 2 ΞR ). The corresponding H characteristic matrix as given by (12) is ϒT 2 . Therefore, H

E increases monotonically if ℓ increases. uSW R Hence from the remarks above, the best reconstruction for the V RET 2 and V RESW E is obtained by considering all PCs. As a conclusion, these two criteria are not able to determine the optimum number of PCs. In the sequel, we consider the V RESPE and V REϕ criteria. It is well known [2], [4] that the V RESPE depends on the SPE statistic and it is the only VRE criterion proposed in the literature. In this paper, our main contribution consists in studying the VRE of the combined index and the final expression to be minimized with respect to ℓ becomes: m

V REϕ (ℓ) = |{z} min ℓ

ϕ

∑ uR

(36)

R=1

The minimum of these studied criteria are obtained numerically through simulations examples in the next section. IV. CASE STUDY: SIMULATIONS EXAMPLES In this section, we have simulated an academic example of data [6] which were composed initially by the nine first variables as shown in table I. In order to assess the effectiveness of the V RESPE and V REϕ criteria, these data are augmented by adding some new variables. υ , X8 , X9 and X11 are uncorrelated normal random variables with zero mean and unit variance. To consider the noise measurements, uncorrelated random signals normally distributed with zero mean and 0.02 as standard deviation were superimposed to each generated variable. The data are constituted of N = 450 samples generated under normal operating conditions (without faults). These data contain linear and nonlinear analytical redundancies relations, as well as independent variables. These relationships or characteristics are described in the second column of table I. The expressions: “Ind.”, “S.R.{*}” and “D.{*}” indicate respectively that the corresponding variable

is “independent”, “source of redundancy ” and “depends on”. For example, in second row of table I, “S.R.{4,5,6}” signifies that the variable X2 is source of redundancy for the fourth, fifth and sixth variables. This is equivalent to say that the fourth variable depends on the second one. This is why, we have “D.{1,2}” in the fourth row of the same table. Consequently, when these variables: X2 , X4 , X5 and X6 are grouped in the same data sets, they represent one PC in the PCA-model. To identify the group of variables representing together one PC, we must find their source of redundancy. In addition, each uncorrelated or independent variable represents one PC in the PCA-model. For this reason, in third column of table I, we affected “1” if the corresponding variable is independent or source of redundancy and “0” otherwise. This enables us to determine the theoretical optimum number of PCs constituting such data sets by reading the fourth column of table I. For example, the data sets which are constituted by variables X1 to X6 contain 3 PCs. For the whole data sets (X1 to X14 ), there are 7 PCs. We propose many data sets in order to assess the performance of the VRE criteria in the selection of the optimum number of PCs. The first data are denoted by D1 constituted by the first four variables. D2 are the second data sets which are obtained by adding X5 to D1 and so on. In this way, 11 data sets are obtained for study (table II). When the PCA-model is optimal, an independent variable must be projected entirely with its fault into the PCS because it represents an important PC. For the correlated variables, only their normal and probably some parts of their faults should be projected into the PCS. However, their measurements noises and their remaining faults parts are projected into the RS. If fewer PCs are selected than required, an independent variable can be projected into the RS. In this case, the mentioned variable is considered correlated with other variables. On the contrary, if more PCs than necessary are selected, the model will be over-parameterized and will include noise. Consequently, the correlations between variables are weakened because when the retained number of PCs increases, the projection matrices Cˆ and C˜ tend to identity and empty matrices respectively. As consequences, the system will not be represented in an optimal way. To illustrate these interpretations, we will consider as example the data sets D5 (table II) in which there are one independent variable (X8 ) and three redundancies. This implies that the theoretical number of PCs required to represent adequately the system is 4 (table I). Fig. 1 presents 3 different models which are represented by 3 projection matrices on the RS. If a retained number of PCs equals to 3, the variable X8 is considered highly correlated because its ˜ = 3) is corresponding diagonal coefficient in the matrix C(ℓ 0.9875. Consequently, this variable is projected entirely in the RS. In the case when the optimum number of PCs is selected (ℓ = 4), this coefficient tends to zero which implies that the corresponding variable X8 is independent. Increasing the number of PCs (ℓ = 6), this variable remains independent. However, the variable X1 tends to be independent. From this example, we conclude that the system can not be represented

871

adequately when the retained number of PCs is not optimum. In the following paragraphs, we will study the effectiveness of the proposed criteria in the selection of this number. The classical idea assumes that the best reconstruction is obtained from a number of PCs which corresponds to the minimum of the V RESPE criterion. However, this retained number is not always optimum because the mentioned criterion can not take into account the uncorrelated variables. This criterion indicates the number of redundancies in data. For this reason, the authors of references [2], [4] proposed several steps to find the optimum model through the same criterion as follows: (i) Consider all monitoring variables. (ii) Minimize the objective function (35) with respect to ℓ. (iii) Discard the variables for which the individual VRE (31) are greater than ΞTR ΣΞR . In the case when the data are scaled to zero means and unit variances, ΞTR ΣΞR = 1. (iv) Repeat the analysis without discarded variables, if the VRE (35) remains the same, this implies that these variables are independent. In this paper, we propose a new procedure using the same criterion. This approach calculates directly the optimum number of PCs by minimizing the objective function (35), taking into account all variables. The steps of this method are: (i) Consider all monitoring variables. (ii) Minimize the objective function (35) with respect to ℓ by annulling the individual VRE (31) which are greater than ΞTR ΣΞR . The component corresponding to the minimum of VRE (35) represents the optimal number. These steps are elaborated by the following equation: m

min V RESPE (ℓ) = |{z} ℓ

∑ uSPE R

only for uSPE < ΞTR ΣΞR R

R=1

(37) The second proposed approach consists of determining the minimum of the V REϕ , with respect to ℓ, for best reconstruction based on the combined index. It depends on the significances levels (34) of the control limits of SPE and T 2 indices. An important feature of this approach is that it has a minimum corresponding to the best reconstruction into both subspaces. To assess the effectiveness of the proposed criteria, we consider the data sets described in table II. Since the noise measurements are random variables, each data sets was simulated 500 times. These criteria are calculated during each simulation. Then, their averages performances in the selection of the best number of PCs on the 500 simulations are determined. By investigating the second column of table II, we remark that the V RESPE criterion is unable to determine the true number of PCs when the redundancies relations between variables is lower than two. In data D1, we have really two sources of redundancies while the simulations results of this criterion indicate that there is only one. In the others data sets, this criterion determines at 100% the true number of redundancies. Taking into account the independent variables, we remark that the improved criterion (37) shows promising results in the selection of the optimum number of PCs. It determines at 100% the best number in all data (3rd column of table II).

Applying the second new approach (36) which depends on the combined index, we obtain the results given in the fourth column of table II. With an appropriate significance level, the performance of this criterion in the selection of the optimum number of PCs is greater than 90% in most cases. According to this approach, the problem remains in the choice of the best value of the significance level α . For this reason, we propose to show the efficiency of the V REϕ criterion by varying α from 1 to 99% with a step of 1%. In this case, each studied data are simulated 100 times then the performance of this criterion is calculated. We found that for a significance level value between 0.1 to 0.25, the V REϕ criterion gives highly performances which are greater than 90% (Fig. 2). Without loss of generality, we have considered the data D6 to illustrate the results given by the three criteria (Fig. 3). The principal components corresponding to the minimum of V RESPE (Fig.3.b) and V REϕ (Fig.3.c and d) criteria are the optimum number for PCA-model of data D6. However, the minimum corresponding to the V RESPE (Fig.3.a) represents the number of redundancies. V. CONCLUSIONS In this paper, we have proposed two new approaches for determining the optimum number of principal components in order to represent adequately the system by a PCAmodel. These methods are based on the variance of the reconstruction error (VRE) approach. Our main contribution is a generalization of the VRE criterion to any quadratic detection index. The effectiveness of the proposed approaches is illustrated by considering an academic example.

872

 0.6586  0.0620   0.0456  -0.2145 ɶC(ℓ=3) =   -0.2977   -0.2867  -0.0443  -0.0117 

0.0620 0.0456 -0.2145 -0.2977 -0.2867 -0.0443 -0.0117  0.4999 -0.0056 -0.2842 0.3772 -0.1456 0.0106 -0.0419  -0.0056 0.4795 0.0298 0.0365 0.0387 -0.4897 -0.0628   -0.2842 0.0298 0.6517 0.0266 -0.3108 -0.0266 -0.0365  0.3772 0.0365 0.0266 0.5268 -0.1193 -0.0418 0.0187   -0.1456 0.0387 -0.3108 -0.1193 0.6792 -0.0367 -0.0277  0.0106 -0.4897 -0.0266 -0.0418 -0.0367 0.5167 -0.0635   -0.0419 -0.0628 -0.0365 0.0187 -0.0277 -0.0635 0.9875 

 0.6585  0.0615   0.0448  -0.2149 ɶC(ℓ=4)=   -0.2974   -0.2871  -0.0450  0.0004 

0.0615 0.0448 -0.2149 -0.2974 -0.2871 -0.0450 0.4981 -0.0084 -0.2857 0.3780 -0.1467 0.0079

 0.0245  0.0058   0.0224  0.0908 ɶ ℓ=6)=  C(  0.0277   -0.1174  -0.0234   0.0001 

Fig. 1.

-0.0084 0.4755 0.0275 0.0376 0.0370 -0.4937 -0.2857 0.0275 0.6503 0.0273 -0.3118 -0.0290 0.3780 0.0376 0.0273 0.5265 -0.1188 -0.0406 -0.1467 0.0370 -0.3118 -0.1188 0.6785 -0.0384 0.0079 -0.4937 -0.0290 -0.0406 -0.0384 0.5127 0.0005 0.0003 0.0001 0.0004 -0.0007 -0.0003

0.0058 0.0022 -0.0145 0.0234

0.0224 -0.0145 0.4747 0.0407

0.0908 0.0234 0.0407 0.3406

0.0277 0.0057 0.0457 0.1010

-0.1174 -0.0344 0.0439 -0.4496

-0.0234 0.0151 -0.4929 -0.0424

0.0057 -0.0344 0.0151 0.0000

0.0457 0.0439 -0.4929 0.0003

0.1010 0.0323 -0.1263 -0.0475 -0.4496 -0.1263 0.6138 -0.0453 -0.0424 -0.0475 -0.0453 0.5119 0.0005 0.0002 -0.0006 -0.0003

0.0004  0.0005  0.0003   0.0001  0.0004   -0.0007  -0.0003   0.0000 

0.0001  0.0000  0.0003   0.0005  0.0002   -0.0006  -0.0003   0.0000 

Three different residual projection matrices for data D5

TABLE II 110

110

100

100

90

90

80

80

70

%

60

VRE : 4 PCs for D6

50

VRE : 6 PCs for D8

φ

%

φ

VRE : 3 PCs for D4 φ

50 40

40

30

30

20

20

10

φ

10 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0

0.9

0

0.1

0.2

0.3

0.4

0.5

α

110

100

100

90

90

80

0.7

0.8

0.9

0.8

0.9

80

70

70

60

VRE : 7 PCs for D9

50

VRE : 7 PCs for D10

60

φ

%

%

0.6

α

110

40

30

30

20

20

10 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

α

Fig. 2. The performance of V REϕ to select the best optimum number of PCs, for some data sets, based on the variation of the significance level α

(a)

(b)

15

0.5

0.4

VRESPE

VRESPE 5

0.3

0.2

0.1

0

1

2

3

4

5

6

7

0

8

1

2

3

PCs number

4

(c)

4.5

4.5

7

8

6

7

8

φ

4

3.5

VRE

φ

6

(d) 5

4

VRE

5

PCs number

5

3.5

3

3

2.5

2.5

2

2 1

2

3

4

5

6

7

8

1

2

PCs number

3

4

5

PCs number

Fig. 3. The VRE criteria for D6: (a): V RESPE , (b): V RESPE , (c): V REϕ (α = 10%) and (d): V REϕ (α = 20%)

[1] M. Tamura and S. Tsujita, A study on the number of principal components and sensitivity of fault detection using PCA, Computers & Chemical Engineering, vol. 31, no. 9, 2007, pp 1035-1046. [2] S.J. Qin and R. Dunia, Determining the number of principal components for best reconstruction, Journal of Process Control, vol. 10, no. 2, 2000, pp 245-250. [3] S.J. Qin, Statistical process monitoring: basics and beyond, Journal of Chemometrics, vol. 17, no. 8-9, 2003, pp 480-502. [4] S. Valle, W. Li and S.J. Qin, Selection of the Number of Principal Components: The Variance of the Reconstruction Error Criterion with a Comparison to Other Methods, Ind. Eng. Chem. Res., vol. 38, no. 11, 1999, pp 4389-4401. [5] H.H. Yue and S.J. Qin, Reconstruction-Based Fault Identification Using a Combined Index, Ind. Eng. Chem. Res., vol. 40, no. 20, 2001, pp 4403-4414. [6] Y. Tharrault, G. Mourot, J. Ragot and D. Maquin, Fault detection and isolation with robust principal component analysis, Int. J. Appl. Math. Comput. Sci., vol. 18, no. 4, 2008, pp 429-442. [7] G.E.P. Box, Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems, I. Effect of Inequality of Variance in the One-Way Classification, Ann. Math. Statist., vol. 25, no. 2, 1954, pp 290-302. [8] P. Nomikos and J.F. MacGregor, Multivariate SPC charts for monitoring bach processes, Technometrics, vol. 37, no. 1, 1995, pp 41-59. [9] B. Mnassri, E.-M. El Adel and M. Ouladsine, ”Fault Localization Using Principal Component Analysis Based on a New Contribution to the Squared Prediction Error”, in 16th Mediterranean Conference on Control and Automation, Ajaccio, France, 2008, pp. 65-70. [10] B. Mnassri, E.-M. El Adel, B. Ananou and M. Ouladsine, ”Fault Detection and Diagnosis Based on PCA and a New Contribution Plots”, in 7th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes, Barcelona, Spain, 2009, pp. 834-839.

TABLE I S IMULATED D ATA Variables X1 (k) = 1 + υ (k)2 + sin( 3k ) X2 (k) = 2sin( 6k )cos( 4k )exp( −k N ) X3 (k) = log(X2 (k)2 ) X4 (k) = X1 (k) + X2 (k) X5 (k) = X1 (k) − X2 (k) X6 (k) = 2X1 (k) + X2 (k) X7 (k) = X1 (k) + X3 (k) X8 (k) ≈ ℵ(0,1) X9 (k) ≈ ℵ(0,1) X10 (k) = X2 (k) + 3X3 (k) X11 (k) ≈ ℵ(0,1) X12 (k) = cos(X2 (k))cos( 6k ) X13 (k) = 2X12 (k) X14 (k) = 2 + 3X11 (k)

V REϕ 100%: 3 PCs 99%: 3 PCs 100%: 3 PCs 100%: 3 PCs 95.6%: 4 PCs 99.2%: 4 PCs 95.6%: 5 PCs 100%: 6 PCs 97.8%: 7 PCs 99.4%: 7 PCs 90.8%: 7 PCs

R EFERENCES 0

α

10

V RESPE 100%: 3 PCs 100%: 3 PCs 99.6%: 3 PCs 99.8%: 3 PCs 99.8%: 4 PCs 100%: 4 PCs 100%: 5 PCs 100%: 6 PCs 100%: 7 PCs 100%: 7 PCs 100%: 7 PCs

φ

10 0

V RESPE 100%: 1 PC 100%: 2 PCs 100%: 2 PCs 100%: 3 PCs 100%: 3 PCs 100%: 3 PCs 100%: 3 PCs 100%: 3 PCs 100%: 3 PCs 100%: 4 PCs 100%: 5 PCs

VRE : 7 PCs for D11

50

φ

40

0

Data sets D1={X1 to X4 } D2={X1 to X5 } D3={X1 to X6 } D4={X1 to X7 } D5={X1 to X8 } D6={X1 to X9 } D7={X1 to X10 } D8={X1 to X11 } D9={X1 to X12 } D10={X1 to X13 } D11={X1 to X14 }

70

VRE : 3 PCs for D2

60

0

P ERCENT PERFORMANCE OF STUDIED VRE CRITERIA

Characteristic S.R.{4,5,6,7} S.R.{4,5,6} S.R.{7} D.{1,2} D.{1,2} D.{1,2} D.{1,3} Ind. Ind. D.{2,3} S.R.{14} S.R.{13} D.{12} D.{11}

PC 1 1 1 0 0 0 0 1 1 0 1 1 0 0

cum. PCs 1 2 3 3 3 3 3 4 5 5 6 7 7 7

873