IMA Journal of Management Mathematics Page 1 of 18 doi:10.1093/imaman/dpnxxx
Drift Detection and Characterization for Condition Monitoring Application to Dynamical Systems with Unknown Failure Modes A NTOINE C HAMMAS , M OUSSA T RAORE , E RIC D UVIELLA , M OAMAR S AYED -M OUCHAWEH AND S T E´ PHANE L ECOEUCHE Univ Lille Nord de France, F-59000 Lille, France EMDouai, IA, F-59500 Douai, France. [antoine.chammas, eric.duviella, moamar.sayed-mouchaweh, stephane.lecoeuche]@mines-douai.fr,
[email protected] In this paper, a condition monitoring architecture of dynamical systems with unknown failure modes is proposed. The architecture is based on a supervision, diagnosis and prognosis modules. The considered faults are slowly evolving gradual faults known also as drift. The approach consists in extracting indicators for drift detection and characterization. Moreover, in many real applications, dysfunctional analysis techniques do not allow the determination of the complete list of failures that may impact a system. Our proposed architecture allows us to update this a priori analysis. The architecture is based on a dynamical clustering algorithm which allows a continuous update of the operating modes (normal and degraded) of the system. The method is applied on a case study of a tank system. Keywords: drift, condition monitoring, maintenance, prognosis, dynamical clustering, dynamical systems.
1. introduction The main objective of the predictive maintenance policy is to improve the availability and the reliability of industrial processes. Few years ago, for predictive maintenance, a unified architecture, the ConditionBased Maintenance (CBM) has been proposed by Lebold & Thurston (2001). It is based on six modules gathering a set of tools which are necessary to perform the predictive maintenance tasks. Inside the CBM structure, there exists a supervision system that determines the equipment’s health (Muller et al., 2008a; Vachtsevanos et al., 2006). It is designed to provide information when the process is subject to an incipient fault known also as drift. In this situation, the process passes gradually from normal to failure through an intermediate state called degraded state (Isermann, 2005). Byington et al. (2003) proposed an OSA-CBM based on an Open System Architecture (OSA) to provide a generic architecture for the development of prognostic systems. In (Muller et al., 2008b), an integrated system of proactive maintenance (ISPM) based on three modules, i.e. supervision, prognosis and aid-decision making process, is proposed. Whatever the maintenance strategies are, a preliminary dysfunctional analysis of the processes, as FMECA (Failure Modes, Effects and Criticality Analysis), is generally required in order to determine the critical components of the systems which have to be supervised. From the FMECA, failure trees of the process can be built to highlight graphically the propagation of failures on the process (Vesely et al., 1981). Then, static Probability Functions by Episode (PFE) can be associated to the elements of the failure tree in order to determine the occurrence risks of failures (Desinde et al., 2006). In (Traore
34 of Mathematics and its Applications. All rights reserved. c The authors 2008. Published by Oxford University Press on behalf of the Institute
2 of 18 et al., 2009; Chammas et al., 2011), supervision and prognostic tools have been developed to generate dynamic PFE. However, these methods have been proposed with the assumption that all the failures are identified and well defined. In fact, several a priori failures, which are not characterized in the FMECA analysis, can occur during the operational time of a process. Thus, it is necessary to propose a methodology to detect drifts toward unknown failure modes, to give information to the process supervisor, and to update the dysfunctional analysis of the process. It fits well with the need of developing an efficient and complete risk assessment approach. Additionally, CBM relies also on a prognosis module. The reason is that a prognosis module provides, via a long-term prediction of the fault evolution, information on the time where a process will no longer perform its intended function. Knowing the Remaining Useful Life (RUL) time is crucial for maintenance decision making. In a general way, prognosis approaches rely on indicators that reflect the evolution of the drift in order to estimate the RUL. Prognosis approaches can be divided into three categories (Sikorska et al., 2011; Vachtsevanos et al., 2006; Byington et al., 2002; Muller et al., 2008a) according to the defect evolution model; Model-based prognosis, data-driven prognosis and experience-based prognosis. In the model-based prognosis, the evolution’s law of the drift can be characterized by a physical model (Jardine et al., 2006). It could be directly related to time, such as the crack grow length law (the Paris-rule) and the pneumatic erosion linear model (Meeker & Escobar, 1998). It could also be analytically related to observable variables in a process, such as, in (Oppenheimer & Loparo, 2002), a physical relationship between fault severity and machine vibration signals was used. In experience-based approach, prognosis is based on the evaluation of a reliability function or on a stochastic deterioration function. A common example of a reliability function is the bathtub curve which is expressed using a sum of several Weibull distributions. A common example of a stochastic deterioration function is the gamma function (Noortwijk, 2009). Finally, data-driven based approaches are used when no physical model of the defect evolution is available. They can be roughly divided into statistical based methods and neural network based methods (Peysson et al., 2008; Brotherton et al., 2000). Statistical methods for prognosis are based on CM measurements related to the health condition or state of the physical asset. There are two types of CM measurements: Direct CM (DCM) measurements and Indirect CM (ICM) measurements (Si et al., 2011; Jardine et al., 2006). The difference is that in the former, the degradation state is directly observable, but in the latter, the degradation state must be reconstructed. In the case of DCM, the estimation of the RUL is the estimation of the CM measurements to reach a pre-defined threshold level. In the case of ICM, it is possible to extract some meaningful indicators that are able to represent the health state of a system. Then, the estimation of the RUL is done by exploiting these indicators. The idea is to evaluate their trajectory into a future horizon (Peysson et al., 2008) until it reaches a failure situation. In order to project the health indicator into future horizons, time series prediction models can be used, such as exponential smoothing and regression based models. In Peysson et al. (2009), an exponential smoothing was used on sensor measurements to estimate degradation trends. In (Yan et al., 2002), a logistic regression was used to assess the machine condition and an ARMA (auto regressive moving average) model was used for the prediction of the degraded performance of the machine. In (Li et al., 2010), faults were reconstructed by Principal Component Analysis (PCA) on sensor measurements followed by a de-noising step using wavelet networks. Then, the fault trend was recursively identified by a recursive least square auto-regressive (AR) model. In (Chammas et al., 2012), an architecture of supervision and diagnosis for maintenance of dynami-
35
3 of 18
Historical database
Offline Model
1) Online operating system – Online generation of features
2) Dynamical adaptation of the parameters (mean and covariance matrix) of clusters
New pattern
Supervision 3) Hypothesis test for drift detection
Dysfunctional analysis methods Prognosis 7) Prognosis – provide RUL
8) Call for expertise and update of the dysfunctional analysis
NO Diagnosis
4) Drift Detected ?
5) Compute drift characterization indicators
YES 6) Known direction ?
1. Direction indicator 2. Severity indicator
NO
YES Alarm
FIG. 1. Architecture of supervision, diagnosis and prognosis with unknown failure modes.
cal systems was proposed. In this work, the problem of unknown failure modes has been considered. Indicators for detection and characterization of drifts for supervision and diagnosis purposes were developed. They also allowed the distinction between known and unknown failure modes. In this paper, this work is reconsidered and more developed. In addition, an extension including a prognosis module is given. The global architecture defined in (Chammas et al., 2012) is updated in order to take into consideration all these aspects. It is based on a data-driven approach. The developed algorithm is intended to be put in a global CBM architecture because it provides pertinent indicators to do so. In section 2, the proposed approach for supervision, diagnosis and prognosis is shown and depicted. Additionally, the main functionalities of the dynamical clustering algorithm are detailed. In section 3, we describe the drift detection method we use and the computation of the indicators characterizing the drift is detailed. In section 4, we show how we exploit these indicators for prognosis purposes. In section 5, our approach is tested in the case of a tank system, and the results are given. We conclude in section 6 our work with some perspectives. 2. Architecture of supervision, diagnosis and prognosis Our architecture for supervision, diagnosis and prognosis is depicted in Fig. (1).
36
4 of 18
A system can have different operating modes. These are characterized by normal, degraded or failure modes. The normal operating mode corresponds to the normal functioning of a system, in which case is fulfilling its task. A degraded operating mode is the functioning mode where the system still operates but not at its optimal efficiency. A failure mode corresponds to the breakdown of the system, or its end of life. Systems are equipped with sensors that produce continuous streams of data. These data are preprocessed and features are extracted from them. Features are meaningful characteristics of the operating modes of the system. They are essential for building a discriminative model for diagnosis/prognosis. The feature extraction and selection are out of the scope of the paper. These features are used to build a feature space. Let d be the number of features. Thus, d is also the dimensionality of the feature space. Any d-feature vector in this space is called a pattern, and denoted by X(t), such as X ∈ Rd . X(t) is the pattern that arrives at a time stamp t. Clusters are a group of similar patterns in restricted regions in the feature space. Consider a system that is described by a normal operating mode and several failure modes. The modeling consists in estimating clusters in the feature space. Then, a decision space is deduced from the feature space by exploiting the knowledge stemmed from the cluster parameters and by assigning an operating mode to each cluster. This step is made in an offline manner. It is issued from the dysfunctional analysis method along with the availability of historical database. This analysis is realized according to an expert knowledge on the process. A drift is a gradual change in the distribution of patterns that is mainly caused by an incipient fault. For this reason, dynamical clustering appears to be an appropriate way to follow the evolution of a drift because it allows continuous updates of the distribution of patterns, thus of the decision space (Boubacar et al., 2005, 2004; L.Iverson, 2004; Boukharouba et al., 2009). The idea behind this approach is to iteratively update the parameters of clusters in the feature space as new data arrives. The clustering tool used here is an auto-adaptive classifier called AuDyC (Boubacar et al., 2004; Lecoeuche & Lurette, 2003). This allows to follow the temporal evolution of a drift, and to compute indicators for supervision, diagnosis and prognosis. The supervision module consists in a continuous monitoring of the health state of the system. In case of fault, the supervision module will launch an alarm that indicates its occurrence. Upon detection, in the diagnosis module, indicators to isolate the fault and to assess the current state of the system’s health are computed. These are the drift characterization indicators, i.e. the direction indicator (for isolation) and severity indicator (for health assessment). In case of an isolated fault (known fault), the prognosis module is used to provide an estimation of the RUL. In case of detection of an unknown failure mode, expert knowledge is called. In this case, an update of the knowledge on the system is required. In addition, a request to the supervisor of the system is sent in order to re-actualize its dysfunctional analysis. 2.1
Dynamical clustering algorithm
Our architecture is based on monitoring parameters of clusters that are dynamically updated at each instant t. The dynamical clustering algorithm we use is AuDyC and it stands for Auto-Adaptive Dynamical
37
5 of 18 Clustering Lecoeuche & Lurette (2003); Boubacar et al. (2005). It is used to model, continuously, the operating modes of the process. It uses a technique that is inspired from the mixed Gaussian model. The features corresponding to different modes are modeled using Gaussian prototypes Pj characterized by a center µPj and a covariance matrix ΣPj . Each cluster can be composed of several Gaussian prototypes but in this work, we assume that each cluster is composed of only one Gaussian prototype. A minimum number of Nwin patterns is necessary to define one cluster, where Nwin is a user defined threshold. The Eq. (2.1) shows the density function g(X|Pj ) of a multivariate Gaussian distribution (with mean µPj and covariance matrix ΣPj : g(x|Pj ) =
1 exp(−0.5 ∗ (X − µPj )T ΣP−1 (X − µ)), j (2π)d/2 |ΣPj |0.5
(2.1)
where d is the dimensionality of the feature space. Each new feature vector X(t), is compared to existing clusters then assigned to the cluster that is most similar to it. The metric to evaluate similarity to a cluster Pj is a membership function λ (X, Pj ). It is given by: λ (X, Pj ) = exp(−0.5 ∗ (X − µPj )T ΣP−1 (X − µPj )). j
(2.2)
If ∃ j ∈ {1, . . . , J}, λ (X, Pj ) > λ m, where λm is a user defined threshold, then the feature vector X is assigned to that cluster. In the opposite case, where ∀ j ∈ {1, . . . , J}, λ (X, Pj ) < λ m, then a new prototype is created. After assignment to a cluster, the latter’s parameters are updated. In Fig. (2), the adaptation mechanism is graphically illustrated. The basic idea of the adaptation is to recalculate the mean and the covariance matrix of the newest Nwin values forming the new cluster. The equations for the iterative
Oldest pattern
Cluster boundary defined by the value of m
At time t
Newest arrived pattern – inside the boundary affected to the cluster
At time t+1
Oldest pattern forgotten – Update of the parameters of the cluster
Dynamical adaptation
FIG. 2. Update of the parameters of the clusters by AuDyC.
38
6 of 18 computation of the mean and the covariance matrix are: µPj (t) =
µPj (t − 1) +
1 (X(t) − X(t − Nwin + 1)) Nwin
ΣPj (t) = ΣPj (t − 1) + ∆ X with
1 Nwin 1 Nwin (Nwin −1)
1 Nwin (Nwin −1) −(Nwin +1) Nwin (Nwin −1)
(2.3) !
∆XT
(2.4)
∆ X = X(t) − µPj (t − 1) X(t − Nwin + 1) − µPj (t − 1)
We note that AuDyC contains a lot of other functionalities but we cited those who are interesting for our application. More details on the functionalities and on the rules of recursive adaptation in AuDyC can be found in (Lecoeuche & Lurette, 2003; Boubacar et al., 2004). The modeling process using AuDyC involves an offline learning phase, followed by an online autoadaptive phase. In the offline phase, a model is learnt and then used as an initial model in online fashion. These steps correspond to the first three blocks in Fig. (1). To start the learning process using AuDyC, an initial database is provided. It is composed of several sets: • Sn : the set of data corresponding to normal operating mode. Let NSn = card(Sn ) be the cardinality of Sn . • SFi , i = 1, ..., n f : sets of data corresponding to different failure modes. n f is the number of sets given. We start the training process with an empty decision space. The first training set used for learning is Sn . The first feature vector is inserted in the database as the initial cluster, denoted Cn (t = 1). Each subsequent input feature vector is assigned to Cn (t) and the parameters (center and covariance matrix) of the latter are updated. The process iteratively continues on each feature vector of the training set Sn . After a certain time, the cluster Cn (t) converges to a region in the feature space. At this point, we define CN = Cn (NSn ) the last cluster that is covering the region corresponding to the initial normal operating mode. The cluster CN is characterized by its mean, denoted µN and its covariance matrix, denoted ΣN . The same is done to the training sets SFi , i = 1, ..., n f and the result is the obtainment of CFi , i = 1, ..., n f , the last clusters covering failure modes (see figure (3(a)). Once this offline model is built, it can be plugged to the online system that is providing a data stream of patterns. As each pattern is available, it is compared to the knowledge base model constructed on the initial training set. Under normal operating conditions, features will be assigned to the current class, which is denoted by Ce (t). Ce (t) is called the evolving cluster at time t. After each assignment, the parameters (mean and covariance) of Ce (t) awill be updated, i.e. the cluster is updated. We have Ce (t = 0) = CN and under normal operating mode (without drift), Ce (t) ≈ CN . After the occurrence of a drift, there are three possible outcomes (see figure 3(b,c,d)) for the trajectory of the mean of Ce (t) after continuous adaptation: • Towards a known region in case of a known fault, • Towards an unknown region in case of unknown fault, • Possible change in direction due to multiple faults.
39
7 of 18
Unknown regions
Unknown regions
CF1
CF1
CN
CN
Ce (k ) CF2
CF2
Ce (k 1)
(a) Offline model: knowledge base
Trajectory
(b) Drifting towards a known region
Trajectory
CF1
CF1
CN
CN CF2
CF2
(c) Drifting towards an unknown region
(d) Possible change of direction of a drift
FIG. 3. Different possible drifting scenarios (resulting from AuDyC adaptation mechanism).
In the last two cases, Ce (t) reaches an unknown region. After the detection of these cases, a dysfunctional analysis has to be updated to take into account the new faulty modes. This is reflected in the decision space by the appearance of a new region corresponding to the new failure mode. The interpretation of these new operating modes has to be done by experts. This new acquired knowledge will be helpful for the optimization of maintenance strategies in the future. 3. Drift detection and characterization A drift is the case when the process passes gradually from a normal operating mode to failure passing by faulty situation. This is reflected in the feature space by a change in the parameters of the pattern distribution (Kuncheva, 2004; zliobate, 2009; Minku & Yao, 2011). In case drift occurs, it is necessary to detect it as soon as possible, in order to alert the system user. Later, it is important to characterize the drift in order to follow its evolution. In order to do so, we propose the following two characterization criteria: 1. Severity: it is an indicator that assesses the amplitude of a drift.
40
8 of 18 2. Direction: when a drift occurs, the distribution of the patterns starts to change. The patterns gradually start to move from the initial region. When a known fault is causing the drift, the patterns tend to move towards the failure region corresponding to this fault. In the opposite case, when an unknown fault is causing the drift, the patterns tend to move towards an unknown region of the space. The predictability or direction criterion is the ability to say towards which region patterns are approaching in the feature space. In this section, the detection and characterization of drifts is presented. 3.1
Drift detection
Methods for detecting drifts have been largely studied in the literature (Sayed-Mouchaweh & Lughofer, 2012; Pechenizkiy & Zliobaite, 2013). Examples of techniques for drift detection are statistical hypothesis tests (MvBain & Timusk, 2009; Li et al., 2010; Nishida & Yamauchi, 2007), control charts such as cusum or Shewhart charts (Kuncheva, 2009; Basseville & Nikiforov, 1993; Alippi et al., 2009), or fixed thresholds as well as adaptive thresholds (Gama et al., 2004; Baena-Garcia et al., 2006; Ditzler & Polikar, 2011). The methods for detecting drifts depend on the algorithm used to adapt the pattern models. In this paper, we adopt a drift detection technique that makes use of the cluster means. The assumption is that a drift will cause the evolving clusters to move away from the cluster obtained in the training phase under normal operating conditions (CN ). Since we use a parametric technique (Gaussian model) to model the data in the feature space, a hypothesis test on the means can be used to detect drifts. Let S = {X(t − Nwin + 1, . . . , X(t))} the sample of the last Nwin patterns. Their mean and covariance are given by AuDyC at each instant, and they correspond to µe (t) and Σe (t). The null hypothesis is that µe (t) did not significantly deviate from µN . So we have: H0 : µe (t) = µN .
(3.1)
In case of no drift, this hypothesis must be accepted. Since the sample S contains patterns that are statistically independent and normally distributed, then the Hoteling T-square statistical test can be used. The Hoteling T-square statistic, denoted by Tsq , is given by: Tsq (t) = Nwin × (µe (t) − µN )T × Σe−1 (t) × (µe (t) − µN ).
(3.2)
If Nwin is not large enough (Nwin < 50), it is known that: Tsq ∼
d.(Nwin − 1) F , Nwin − d (d,Nwin −d)
(3.3)
where F(v1 ,v2 ) is the Fisher-Snedecor distribution with v1 and v2 degrees of freedom, and d the dimensionality of the space. Consequently, this particularity will be used to compute a threshold for drift detection. In fact, at each time stamp t, the value Tsq (t) is computed. Let:
41
9 of 18
Th =
d.(Nwin − 1) . Nwin − d
(3.4)
For a user-defined confidence level α, drift is confirmed if:
Tsq (t) > Th × F(d,Nwin −d)|α . 3.2
(3.5)
Drift characterization indicators
Once a drift is detected, computation of drift indicators at each current time step is necessary. In order to characterize a drift, two drift indicators are required: • Direction indicator: based on the co-linearity of the trajectory of the evolving cluster Ce (t) with known failure clusters. • Severity indicator: based on the distance of Ce (t) compared to CN and CFi . 3.2.1 Direction indicator : It is used to pinpoint the cause of the fault (isolation) by studying the direction of the movement of the evolving cluster. For this reason, let PCe (t) = (µe (t), Σe (t)) and PCe (t − N p ) = (µe (t − N p ), Σe (t − N p )) the parameters of the evolving clusters at time t and (t − N p ) respectively. The idea behind defining N p is to consider a time horizon between the centers of two clusters for the calculation of the direction indicator. By choosing a small value for N p (e.g. N p = 1), the direction indicator will be highly sensitive to the evolution of the cluster Ce (t) (high plasticity, small stability). By choosing a large value for N p , the direction indicator will indicate the direction of evolution of Ce (t) over a larger time horizon. Thus, it will be more robust to noisy evolution of Ce (t) but less reactive and slower to detect direction changes (high stability, less plasticity). For this reason, this parameter must be chosen in order to optimize this plasticity-stability dilemma. For the computation of the direction indicator, let µ fi , i = 1, ..., n f the center of the failure clusters corresponding to failure number i. Let: µ (t)−µ (t−N )
• De (t) = ||µee (t)−µee (t−Npp )|| , the unitary vector relating the centers of the clusters Ce (t) and Ce (t −N p ). µ (t)−µe (t)
• Di (t) = ||µ fi (t)−µe (t)|| , the unitary vector relating the centers of Ce (t), and the center of the failure fi cluster CFi . At each step, let pi (t) = DTe (t).Di (t), and Γ (t) = arg(max(pi (t))). pi (t) is the scalar product between 16i6n f
De (t) and the vector Di (t). The closer the value of pΓ (t) is to 1, the more De (t) and DΓ (t) are closer to be colinear. If p(t) = 1, then the drift is linear, i.e. the trajectory of the centers of the cluster Ce (t) is lined up with the fault. The values of the direction indicator and the rules of assignment at each step follow this algorithm: 1. First calculate pi (t) and Γ (t),
42
10 of 18 2. If pΓ (t) > pM , where pM is a user-defined threshold, then it is safe to say that the movement is towards the failure cluster whose number is Γ . In this case, we give a value for direction such that direction = Γ . In the case where pΓ (t) < pM , the movement of the drift is considered to be towards an unknown region. The value of direction in this case is 0. The threshold pM is user defined and depends on the application. The larger its value is, the more the user is assuming a linear drift. 3.2.2 Severity indicator : The severity indicator must reflect how far the evolving class is from the normal class and how close it is getting to a failure class. Because the clusters are Gaussians, the symmetrised Kullback-Leibler divergence metric is used to compare between them. The equation of this divergence between two multivariate Gaussian prototypes P1 = (µ1 , Σ1 ) and P2 = (µ2 , Σ2 ), denoted dKL (P1 , P2 ), is: 1 × (tr((Σ1 )−1 × Σ2 + (Σ2 )−1 × Σ1 ) + 2 (µ1 − µ2 )T ((Σ1 )−1 + (Σ2 )−1 ) × (µ1 − µ2 )) − d,
dKL (P1 , P2 ) = dKL (P2 , P1 ) =
(3.6)
where d is the dimension of the feature space. Two cases are possible:
• The direction of drift is towards a known failure cluster CFi : in this case the severity indicator is denoted svi (t) and is given by: svi (t) =
dKL (CN ,Ce (t)) . dKL (CN ,Ce (t)) + dKL (Ce (t),CFi )
(3.7)
• The direction of the drift is towards an unknown region in space: in this case, the severity indicator will be the divergence of the evolving class from the normal class and will be denoted sv(t) = dKL (Cn ,Ce (t)). From Eq. (3.7), it is clear that a severity indicator svi (t) can take values ranging from 0 to 1, in case of a drift towards a known class. Under normal operating conditions, Ce (t) ≈ CN thus svi (t) → 0. Under failure operating conditions (failure i), Ce (t) ≈ CFi thus svi (t) → 1. It is worthy noting that the severity indicator svi (t) is computed at each instant t, and for all the failure clusters CFi , i = 1, . . . n f . Thus n f values of the severity indicators are calculated at each instant. 4. Prognosis Given the drift indicators, the prognosis module must be able to compute the remaining useful lifetime (RUL). The severity indicators calculated on the earlier section reflects the evolution of the state of health of the system. In order to be able to compute a RUL, the dynamics of this drift must be estimated. To do this, a recursive auto regressive (RAR) model will be fitted on the severity indicators svi (t), i = 1, . . . n f . Once the direction of the drift is isolated, the prediction of the future health state of the system is done using the RAR model with its corresponding severity indicator. The idea is to project the severity indicator’s path into future horizon. Thus, two cases are possible: • Drifting towards a known failure cluster CFi : in this case, the RAR model for the corresponding severity indicator svi is used to compute the RUL. The computation of the RUL is described in the subsection below.
43
11 of 18 • Drifting towards an unknown failure cluster: in this case, the RUL cannot be computed because the region towards the cluster is evolving is unknown. 4.1
RAR model estimation and update
Given a severity indicator svi (t), i = 1, ..., n f , a RAR model of order m can be written as: svi (t) =
m
∑ (a j .svi (t − j)) + e(t),
(4.1)
j=1
where a j , j = 1...m are model parameters and e(t) is the model noise, which is assumed to be zero mean and i.i.d. Let Θi (t) = [a1 (t), a2 (t), ..., am (t)]T , be the parameter vector at time t, and φi (t) = [svi (t − 1) ... svi (t − m)]T
the vector of the last m values of the severity indicator. The parameter vector Θi (t) is recursively updated using the recursive least squares algorithm: Θˆ i (t) = Θˆ i (t − 1) + Ki (t).(svi (t) − sv ˆ i (t)), sv ˆ i (t) = Θˆ i (t − 1).φiT (t),
Ki (t) = Qi (t).φi (t), Pi (t − 1) , Qi (t) = T 1 + φi (t).Pi (t − 1).φi (t)
Pi (t) = Pi (t − 1) − Pi (t − 1).φi (t).φiT (t).Qi (t),
(4.2) (4.3) (4.4) (4.5) (4.6)
where Θˆ i (0) = 0 and Pi (0) = γI, γ >> 0 an arbitrary positive number.
Thus, for each of the severity indicators svi (t), i = 1, ..., n f , a RAR model is kept updated at each time stamp t. Once the direction of the drift is isolated, i.e. Γ is equal to a known direction, RUL prediction is done using the RAR model that corresponds to Γ . 4.2
RUL prediction
Given a severity indicator svΓ (t), the RUL is the time value for which the Eq. (4.7) is satisfied: | svΓ (t + RUL) − 1 |6 ε, ∀ε > 0.
(4.7)
The prediction into future horizons is done by a multistep prediction using a RAR model. At a step t, multistep prediction beginning at svΓ (t) can be obtained in an iterative way: sv ˆ Γ (k + H) = Θˆ ΓT (t)φˆΓ (t + H), (4.8) where Θˆ Γ (t) is the estimated parameter vector at time t and φˆΓ (t + H) = [sv ˆ Γ (t + H − 1), ..., sv ˆ Γ (t + H − m)]T . The value of H is increased incrementally until the condition in Eq. (4.7) is satisfied. However, it is necessary to give a majoring bound for its value, to prevent multistep prediction towards infinity. This bound is denoted by HM and is chosen by the user according to the application. In case of non-detected drift, RUL has no sense and will be given the value -1 in our algorithm. In this case, expert knowledge is called to explain the physical meaning of the drift towards the unknown region. The algorithm for the RUL calculation is depicted in Fig. (4).
44
12 of 18 H 1, RUL 1
|𝑠𝑣Γ 𝑡 + 𝐻 − 1| < 𝜖
YES
RUL H
NO
H Update
ˆT (t H )
Calculate 𝑠𝑣Γ 𝑡 + 𝐻 (Eq. (4.8)) NO
H HM
YES
RUL 1
FIG. 4. Algorithm for the calculation of the RUL.
5. Case study: tank system The database used to test our methodology was simulated using a benchmark of a tank system. Different scenarios including known faults and unknown faults were simulated. The tank system is shown in figure 5. Under normal operating mode, the level of water is kept between two thresholds, hHIGH and hLOW . 1 1 HIGH When the level of water reaches h1 , P1 is closed and V2 is opened. When the level of water reaches hLOW , P1 is opened and V2 is closed. The valve V1 is used to simulate leak in the tank. The surface of the 1 valves V1 and V2 is the same: SV1 = SV2 = SV and the surface of the pump pipe is SP . The instrumentation used consists of only one sensor for the level of water in the tank. It is denoted by h1 . 5.1
Considered faults
In order to test our architecture, three faults are considered, two known faults and one unknown: 1. Fault1 (known): gradual increase of the surface of the valve V1 leading to a gradual increase of the flow of water leaking from the tank. This surface increases from 0% × SV to 30% × SV considered as the maximum intensity of leakage. At this stage, the system is considered in failure. When the surface is between 0% × SV and 30% × SV , the system is faulty (degraded operation). 2. Fault2 (known): clogging of the pump P1 meaning that the flow of water that the pump is delivering is decreasing with time. The same principle for the simulation of V1 is used. A clogging of 30% × SP corresponds to failure. 3. Fault3 (unknown): clogging of the valve V2 . The same principle for the simulation of V1 is used. A clogging of 30% × SV means failure. 5.2
Feature extraction
At the beginning, three data sets are given. One data set corresponds to normal operating mode, and two data sets correspond to fault1 and fault2 operating modes respectively. In figure 6, we show the sensor measurements under normal operating mode. A cycle is a sequence of a filling period followed by a draining period (see Fig. (6). The features extracted from this signal are the time required for
45
13 of 18 the filling T1 and the time required for draining T2 . At the end of each cycle j, one feature vector X( j) = [T1 ( j) T2 ( j)]T is extracted. Thus, the considered time unit will be the index of the cycles denoted by cy( j), j = 1, . . ..
Pump P1
h1HIGH 0.4m 0.5m
0.3m
h1LOW 0.1m
Valve V1 used to simulate the leak
Valve V2 is the normal operating valve of the system
FIG. 5. The tank system.
5.3
FIG. 6. Six filling/draining cycles under normal operating conditions.
Scenarios
Three scenarios are considered: • Scenario1: only fault1 is considered. The drift is simulated linearly from 0% to 30% in 26cy. • Scenario2: only fault2 is considered. The drift is simulated linearly from 0% to 30% in 60cy. The drift in this scenario takes more time to be finished than in scenario 1. Thus, the speed of this drift is lower. • Scenario3: In this scenario, fault2 and fault3 are considered. At the beginning, only fault2 is activated. Then, fault3 is activated with some delay and both faults are thus affecting the system together. The result is a change in the direction of the drift and the idea behind it is to test the ability of our algorithm to detect this case. 5.4
Results
The parameters of the overall algorithm are: λm = 0.02 and Nwin = 30 (section 2), α = 99.5%, Th = 2.1, F(d,Nwin −d|α) = 3.44, pM = 0.85 and N p = 4 (section 3), ε = 0.05 and HM = 300 (section 4). In all scenarios, drift was started at tdri f t = 20cy. The results for the scenarios are depicted in the figures below. The obtained results confirm that our architecture is efficient in handling drifts. The scenarios we chose helped showing how the different aspects of drift detection, diagnosis and prognosis are related. The map showing normal mode and different known and unknown modes is presented in Fig. (7). In scenarios 1 and 2, the drift was towards a known region. The drift was correctly detected in both scenarios. We notice that the detection time is higher in scenario 2 than in scenario 1 (Fig. (8). The reason is that the speed of drift in scenario 2 is lower than the speed of drift in scenario 1.
46
14 of 18
FIG. 7. Map of different known and unknown regions corresponding to normal and failure modes.
FIG. 8. Drift detection results for scenarios 1 and 2.
FIG. 9. Severity indicator results for scenarios 1 and 2.
FIG. 10. Direction indicator results for scenarios 1, 2 and 3.
The severity indicator ranges from 0 (normal operating mode) to 1 (failure operating mode). It was computed in the cases where the direction is known (Fig. (9)). In scenario 3, the direction indicator change values from 2 to 0. This reflects a change in the direction of the evolution of clusters in the feature space. This can only mean that there is another fault affecting the drift (Fig. (10) and Fig. (11)). The same unknown zone is detected even when two known faults are simulated. Thus, in this case, the unknown zone reflects the need to update the dysfunctional analysis using expert knowledge. The drift detection for this scenario is the same as in scenario 2 because fault 2 was simulated in the beginning and so it wasn’t put on Fig. (8). Concerning prognosis, an estimated value of the RUL is given only when the failure mode is known, i.e. when the value of the direction indicator is known. A RUL value is estimated at each instant t using the algorithm depicted in Fig. (4). It is seen in (Fig. (12) and in Fig. (13)) that the error between the estimated RUL and the real RUL converges to 0 as time passes. It is also seen that the time to conver-
47
15 of 18
FIG. 11. Illustration, in the feature space, of the direction change in scenario 3.
FIG. 12. Real RUL vs estimated RUL for scenario1.
FIG. 13. Real RUL vs estimated RUL for scenario2.
gence is related to the speed of drift. A higher drift speed means a faster convergence of the algorithm whereas a slower drift speed means slower convergence. 6. Conclusion and perspectives In this paper, an architecture of supervision, diagnosis and prognosis of dynamical systems is developed. The aim is the condition monitoring and health assessment of dynamical systems with known and unknown failure modes. Only incipient faults, known also as drifts are considered. A methodology for detecting and characterizing drifts was employed. The methodology was based on monitoring the parameters of dynamically updated clusters. Then, a block of drift detection was used to detect drifts. In case of positive detection, two drift characterization indicators were computed. These indicators are the direction indicator and the severity indicator. In case of a known failure mode behind the incipient fault is detected, a prognosis module is triggered to provide a RUL estimate.
48
16 of 18
REFERENCES
In addition, the architecture allows detecting new unknown zones. These zones correspond to either unknown failure modes or multiple failures. A multiple failure mode is caused by a combination of known and unknwon faults. In this work, whatever is the case, the resulting detection of an unknown zone paralyses the ability of the methodology to estimate a RUL because of the limited knowledge on these zones. However, in the case of a multiple failure mode, the detection tool can be still used if a known fault starts before the appearance of an unknown fault. Thus, a call for experts is needed. The expert knowledge must attempt to explain the physics behind the unknown modes. If possible, it could also provide clues for a RUL estimate and update the dysfunctional analysis. Our approach was tested on a case study of a tank system with different incipient fault scenarios. The results confirm the efficiency of the architecture. This methodology is designed to work in any dynamical environment where pertinent features can be obtained. A decision space with overlapping clusters will limit the capacity of this method to generate efficient results. Thus the feature extraction part is very important. In this work, only a RUL estimate is given without a confidence interval associated to it. Confidence intervals are associated to the decision-making process concerning the maintenance actions. Thus, it is interesting to develop this idea in the future work. Another idea that is worthy of more research development concerns also the RUL estimation. For instance, it was shown in the results that if the speed of drift is low, the RUL estimation can take some time in the beginning before it becomes available. Thus, a relevant perspective is how to reduce this initial time. References C. Alippi, et al. (2009). ‘Just in time classifiers: managing the slow drift case’. Proceedings of International Joint Conference on Neural Networks . M. Baena-Garcia, et al. (2006). ‘Early drift detection method’. ECML PKDD International Workshop on Knowledge Discovery from Data Streams pp. 77–86. M. Basseville & I. Nikiforov (1993). Detection of Abrupt Changes: Theory and Application. Prentice Hall, Inc. H. A. Boubacar, et al. (2005). ‘AUDyC Neural Network using a new Gaussian Densities Merge Mechanism’. In 7th International Conference on Adaptive and Natural Computing Algorithms. Coimbra, Portugal. pp. 155158. H. A. Boubacar, et al. (2004). ‘AUDyC Neural Network using a new Gaussian Densities Merge Mechanism’. In 7th Conference on Adaptive and Neural Cmputing Algorithms, pp. 155–158. K. Boukharouba, et al. (2009). ‘Incremental and Decremental Multi-Category Classification by Support Vector Machines’. In 8th International Conference on Machine Learning and Applications, ICMLA, Florida, USA. T. Brotherton, et al. (2000). ‘Prognosis of Faults in Gas Turbine Engines’. In IEEE Aerospace Conference.
49
17 of 18
REFERENCES
C. Byington, et al. (2002). ‘Prognostic Enhancements to Diagnostic Systems for improved ConditionBased Maintenance’. In IEEE Aerospace Conference. C. Byington, et al. (2003). ‘Prognostic enhancements to gas turbine diagnostic systems’. In IEEE Aerospace Conference 103:137–143. A. Chammas, et al. (2011). ‘Supervision of switching systems based on dynamical classification approach’. In European Safety and Reliability Association ESREL’10. A. Chammas, et al. (2012). ‘Condition monitoring architecture for maintenance of dynamical systems with unknown failure modes’. In IFAC A-MEST Workshop on Maintenance for Dependability, Asset Management and PHM, Sevilla, Spain. M. Desinde, et al. (2006). ‘Tool and methodology for online risk assessement of process’. In LambdaMu 15 /Lille. G. Ditzler & R. Polikar (2011). ‘Hellinger Distance Based Drift Detection for Nonstationary Environments’. CIDUE, Symposium on Computational Intelligence in Dynamic and Uncertain Environments pp. 41–48. J. Gama, et al. (2004). ‘Learning with drift detection’. Lecture Notes in Computer Science 3171. R. Isermann (2005). ‘Model-based fault detection and diagnosis - status and applications’. Annual Reviews in Control 29:71–85. A. K. Jardine, et al. (2006). ‘A review on machinery diagnostics and prognostics implementing condition-based maintenance’. Mechanical Systems and Signal Processing 20:1483–1510. L. Kuncheva (2009). ‘Using Control Charts for Detecting Concept Change in Streaming Data’. Tech. rep., Technical Report BCS-TR-001. L. I. Kuncheva (2004). ‘Classifier Ensembles for Changing Environments’. Lecture Notes in Computer Science 3077:1–15. M. Lebold & M. Thurston (2001). ‘Open standards for condition-based maintenance and prognosis systems’. In the 5th Annual Maintenance and Reliability Conference, Gatlinburg, USA . S. Lecoeuche & C. Lurette (2003). ‘Auto-Adaptive and Dynamical Clustering Neural Network’. In ICANN’03 Proceedings. G. Li, et al. (2010). ‘Reconstruction based fault prognosis for continuous processes’. Control Engineering Practice 18:1211–1219. D. L.Iverson (2004). ‘Inductive System Health Monitoring’. In Proceedings of The 2004 International Conference on Artificial Intelligence (IC-AI04), CSREA Press, Las Vegas, NV. W. Meeker & L. Escobar (1998). Statistical Methods for Reliability Data. Wiley series in probability and statistics, applied probability and statistics section, Jhon Wiley and Sons, New York, USA. L. L. Minku & X. Yao (2011). ‘DDD: A New Ensemble Approach For Dealing With Concept Drift’. In IEEE Transactions on Knowledge and Data Engineering.
50
18 of 18
REFERENCES
A. Muller, et al. (2008a). ‘On the concept of e-maintenance: Review and current research’. Reliability Engineering and System Safety 93:1165–1187. A. Muller, et al. (2008b). ‘Formalisation of a new prognosis model for supporting proactive maintenance implementation on industrial system’. Reliability Engineering and System Safety 93:234–253. J. MvBain & M. Timusk (2009). ‘Fault detection in variable speed machinery: Statistical parametrization’. Journal of Sound and Vibration 327:623–646. K. Nishida & K. Yamauchi (2007). ‘Detecting Concept Drift Using Statistical Testing’. Discovery Science-Springer . V. Noortwijk (2009). ‘A survey of the application of gamma process in maintenance’. Reliability Engineering & System Safety 94:2–21. C. Oppenheimer & K. Loparo (2002). ‘Physically based diagnosis and prognosis of cracked rotor shafts’. In Preceeding of SPIE. M. Pechenizkiy & I. Zliobaite (2013). ‘Introduction to the special issue on handling concept drift in adaptive information systems’. Evolving Systems 4:1–2. F. Peysson, et al. (2009). ‘A data Driven Prognostic Methodology without a Priori Knowledge’. In Proceedings of the 7th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes. F. Peysson, et al. (2008). ‘Damage trajectory analysis based prognostic’. In International Conference on Prognostics and Health Management, PHM 2008. M. Sayed-Mouchaweh & E. Lughofer (eds.) (2012). Springer.
Learning in Non-Stationary Environments.
X.-S. Si, et al. (2011). ‘Remaining useful life estimation - A review on the statistical data driven approaches’. European Journal of Operational Research 213:1–14. J. Sikorska, et al. (2011). ‘Prognostic modeling options for remaining useful life estimation by industry’. Mechanical Systems and Signal Processing 25:1803–1836. M. Traore, et al. (2009). ‘Dynamical classification method to provide pertinent indicators for predictive maintenance strategies’. ICINCO’09, Milan, Italie, 2 - 5 juillet . G. Vachtsevanos, et al. (2006). Intelligent Fault Diagnosis and Prognosis for Engineering Systems. Jhon Wiley and Sons, Inc. W. E. Vesely, et al. (1981). Fault Tree Handbook. US nuclear Regulatory Commission, Washington D.C., USA. J. Yan, et al. (2002). ‘Predictive Algorithm for Machine Degradation Using Logistic Regression’. In MIM’2002. I. zliobate (2009). ‘Learning under Concept Drift: an Overview’. Tech. rep., Vilnius University.
51