2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006
An Autonomous Diagnostics and Prognostics Framework for Condition-Based Maintenance Pundarikaksha Baruah, Ratna Babu Chinnam, and Dimitar Filev Abstract— This paper presents an innovative on-line approach for autonomous diagnostics and prognostics. It overcomes limitations of current diagnostics and prognostics technology by developing a “generic” framework that is relatively independent of the type of physical equipment under consideration. Proposed Diagnostics and Prognostics Framework (DPF) is based on unsupervised learning methods (reducing the need for human intervention). The procedures used in DPF are designed to temporally evolve the critical parameters with monitoring experience for enhanced diagnostic/prognostic accuracy (a critical ability for mass deployment of the technology on a variety of equipment/ hardware without needing extensive initial tune-up). This framework is currently under deployment in a major automotive manufacturing plant in Michigan, USA. Results from this pilot program to date are very satisfactory.
D
I. INTRODUCTION
IAGNOSTICS has traditionally been defined as the ability to detect and sometimes isolate a faulted component and/or failure condition [1]. Prognostics builds upon the diagnostic assessment and is defined here as the capability to predict the progression of this fault condition to component failure. In the last two decades, tremendous advances have been made in the area of sensing hardware, IT infrastructure, signal processing algorithms, and modeling methods; still, on-line diagnostics/prognostics are largely reserved for only the most critical system components. This technology has not yet found its place in health management of mainstream machinery and equipment [2]. The concept of autonomous diagnostics is based on unsupervised techniques. The term ‘unsupervised’ implies ability to learn by itself without human supervision. Autonomous diagnostics methods learn gradually from the system onto which they are deployed. Therefore, they can be deployed onto a variety of systems with ease. Once developed, no equipment specific fine-tuning is supposed to be required. A primary challenge in performing effective diagnostics of machinery and equipment is the need to achieve a high This work was supported in part by NSF DMI Grant 0300132 and Ford Motor Company. Pundarikaksha Baruah is a Doctoral Candidate at Wayne State University, Detroit, MI 48201 USA. (e-mail:
[email protected]). Ratna Babu Chinnam, Ph.D. is an Associate Professor of Industrial and Manufacturing Engineering, Wayne State University, Detroit, MI 48201 USA (corresponding author phone: 313-577-4846; fax: 303-578-5902; email:
[email protected]). Dimitar Filev, Ph.D. is with the Ford Motor Company, Dearborn, MI 48121, USA (e-mail:
[email protected]).
0-7803-9490-9/06/$20.00/©2006 IEEE
degree of accuracy in classifying the system’s health state in real-time given some sensory signals. While the extremely vast extant literature reports good success in developing highly effective diagnostic algorithms for certain classes of components and equipment (such as bearings, centrifugal pumps, and electrical motors), most of these successes are based on decades of academic and industrial research and extensive characterization and modeling of equipment behavior through mechanistic modeling (i.e., physics driven models) [3,4]. While such efforts are warranted in dealing with mission-critical systems (that might involve loss of life or incur large financial costs), we need cost-effective technologies that facilitate autonomous diagnostics and prognostics. The goal is to develop “generic” diagnostic and prognostic algorithms and technology that can be rapidly configured, calibrated, and refined using unsupervised learning algorithms to facilitate effective and efficient large scale deployment of CBM technology. Algorithmic novelties of the proposed autonomous diagnostics and prognostics framework (DPF) include: (i) Multi-Basis Clustering and (ii) Optimized Cluster Tracking. Multi-Basis clustering procedure combines principal component analysis (PCA) based dimensionality reduction with an unsupervised clustering technique. Initially, a single principal component (PC) transformation matrix (called raw basis) is constructed from the signal/feature data. A kernel density based unsupervised clustering technique is then employed to cluster the data in the space of the two most dominant PCs, to identify different equipment “modes of operation”. Data points from individual clusters or modes are then identified using sets of indices. A PC transformation matrix is then recomputed for each individual cluster or mode using the corresponding index set, leading to different mode basis for distinct operating mode/cluster. The diagnostics engine employs these bases for raising any pertinent alarms during equipment monitoring. Given that equipment behavior evolves due to such processes as wear-in, maintenance, and wear-out, it is critical that DPF effectively track this non-stationary behavior. To address this issue, DPF employs an optimal cluster tracking procedure using an optimal exponential weighting scheme. In particular, it employs the following two novel strategies to enhance the performance of the diagnostics engine: First, the on-line determination of an optimal exponential discounting factor ensures that the cluster tracking is effective in matching the (rate of) evolution of the equipment operating mode behavior. Secondly, the provision to allow differing exponential
3428
discounting factors for different clusters further enhances the performance of the diagnostics engine. Optimality of discounting factor is established based on an objective function that employs a generalized statistical distance (also called Mahalanobis distance) cost function in the dominant PC space. Subsequent sections describe the proposed framework and its design. The document is organized as follows: section II outlines building blocks of DPF. Section III provides details of the algorithm used in the framework. Section IV presents a case study to demonstrate the effectiveness of DPF followed by concluding remarks pertaining to future research in section V. II. BASIC ELEMENTS OF THE DPF The basic elements of DPF are illustrated in Figure 1. The research that is outlined in this document primarily focuses on the following techniques that are essential to DPF framework: data dimensionality reduction, clustering & classification, forecasting, diagnostics, prognostics and decision-making. Dimensionality Reduction Clustering/ Classification
INPUT: Features/ Signals
Forecasting
Diagnostics
Signal Enveloping End User Feedback Dimensionality Reduction Clustering/ Classification
Decision Support System
Prognostics
Signal Enveloping
Fig. 1. Architecture of the proposed autonomous diagnostics and prognostics framework.
The framework assumes that features derived from sensor signal(s) collected through data acquisition device(s) are available as input vectors. Depending on the type of sensor employed, sometimes the raw signal itself can be used as a feature, and hence, needs no feature extraction process. On the other hand, the type of feature extraction method(s) employed can have dramatic effect on the performance of the diagnostics/prognostics engines. DPF is essentially an autonomous filter for anomaly detection that takes features/signals and yields diagnostic and prognostic results for maintenance decision support. (A maintenance decision support system (DSS) uses diagnostics/prognostics results and recommends necessary actions for maintenance.) DPF is designed for both off-line as well as on-line mode of operation. The following sections describe the procedures used in DPF; they apply to both off-line and on-line mode of operations. A. Inputs for DPF Inputs to the DPF are features or raw signals. Signals that are collected at a high sampling rate (e.g., accelerometer signals) are typically transformed to features. Extant
literature recommends literally hundreds and even thousands of feature extraction techniques for different types of equipment and sensing hardware. Features are essentially transformed original signals. Transformations that are typically used can be classified in several ways: 1) Based on theory: (i) Features with physical meanings (e.g., power spectral density) and (ii) Features with no physical meaning but found to be useful indicators of health (e.g., kurtosis, auto regressive model parameters, etc.) 2) Based on dimensionality of features: (i) Features that increase the dimensionality of the original signal (e.g., power spectrum) and (ii) Features that reduce the dimensionality of the original signal (e.g., statistical mean) 3) Based on domain of analysis: (i) Time domain features (e.g., sample variance), (ii) Frequency domain features (e.g., n peaks of Fourier transform) and (iii) Mixed domain features (e.g., wavelet coefficients) DPF makes provision to incorporate a wide variety of numerical features by allowing external feature extraction as well as a selection module that feeds standard features to DPF. While DPF works with features derived from any/all of the above classes of feature extraction techniques, the main requirement is that individual features have to contain numerical values. Given that DPF is a purely empirical framework, it does not make any explicit differentiation between one feature and another. It does not have limitations on maximum number of features allowed; however, the minimum number of features required is two. B. Dimensionality reduction technique For most equipment yield quite a large set of features and the data points available for initializing and calibrating the algorithms tend to be limited, the resulting data sparsity poses significant challenges. For this reason, feature dimensionality reduction is a crucial step in DPF. DPF aggregates the input features or signals and subsequently transforms the feature vector to two dimensions for subsequent analysis. DPF heavily relies on Principal Component Analysis (PCA) [4] as a means for dimensionality reduction. PCA is a rotational transform that computes the ‘principal components’ by multiplying original feature vectors with a square transformation matrix called ‘basis matrix’. In an on-line setting, the ‘basis matrix’ is updated recursively as new points come into the system. This recursive estimation becomes very critical in monitoring non-stationary processes. C. Clustering DPF employs a clustering method in the two-dimensional principal component space to detect and characterize potentially distinct equipment modes of operation. This framework is independent of the clustering method. Given that autonomous clustering is an ongoing research, the independence of the DPF from a particular method enables this framework to incorporate novel clustering algorithms that add to the effectiveness of DPF. It currently supports
3429
Kernel Density Estimation based clustering [6,7,8] as well as Gaussian Mixture Model based clustering [9]. Once clustering is performed, each cluster is characterized using a mean vector and a covariance matrix, forming a two tuple. D. Classification DPF assigns a new feature vector to existing clusters based on the smallest Generalized Statistical Distance (also called Mahalanobis Distance [10]): T ˆ −1 D = ( X new − µˆ j ) Σ j ( X new − µˆ j ) . Classification is done after dimensionality reduction; in two-dimensional PC space. E. Recursive ( µ , σ 2 ) estimate DPF strictly employs exponential weighted moving average method for recursive estimation of mean and covariance matrices. For feature j , using the new observation x j , the recursive estimation expressions are (used for constructing feature envelope): µˆ j ( new) = αµˆ j ( old ) + (1 − α ) x j ( new)
For the complete feature vector X , the recursive estimation expressions are (used for updating PC Basis):
F. Velocity Standardized velocity within individual clusters is estimated based on consecutive feature vector entries as follows. If X 1 and X 2 denote the most recent consecutive feature vectors in R n , collected at time instants t1 and t2 , then the standardized velocity is calculated as follows: ,
where, Z is standardized feature vector obtained by standardizing each element of X as follows: zj =
xj − µ j
σj
contribution of ℜV to S R ( 0 ≤ rv ≤ 1 ), then S R is computed as follows: S R = wc rc + wspc rspc + wv rv , 0 ≤ S R ≤ 1 .
where, wi ( 0 ≤ wi ≤ 1 ) are the weights assigned to each of the three diagnostics decision making domains. In the absence of any a priori knowledge for which domain might provide better diagnostics, all wi s can be set equal; in this context to one third (1/3). Note that while the proposed severity rating calculation method uses three diagnostic domains, minor modifications can support fourth and additional domains.
This implies that if X new is not within the β % (typically ≥ 99%) probability contour of N ( µ j , Σ j ) , but still is closest
T ˆ ˆ ˆ ˆ ∑ ( new ) = α ∑ ( old ) + (1 − α )( X ( new ) − µ ( old ) ) ( X ( new ) − µ ( old ) )
n(t2 − t1 )
rspc the contribution of ℜSPC to S R ( 0 ≤ rspc ≤ 1 ), and rv the
( X new − µˆ j )T Σˆ j −1 ( X new − µˆ j ) ≥ χ d 2 ( β ) ∀j
µˆ ( new) = αµˆ ( old ) + (1 − α ) X ( new)
( Z 2 − Z1 )T ( Z 2 − Z1 )
Let, rc denote the contribution of ℜC to S R ( 0 ≤ rc ≤ 1 ),
1) Diagnosis based on classification: Here diagnosis is based on a heuristic that assigns a new point X new either to an existing cluster/class Ci or labels it an outlier to Ci . The criterion for labeling a point an ‘outlier’ is:
σˆ 2j ( new) = ασˆ 2j ( old ) + (1 − α )( x j ( new) − µˆ j ( old ) )T ( x j ( new) − µˆ j ( old ) )
Vt2 =
‘severity rating’, S R , computed through a voting algorithm as follows:
, ∀j
G. Diagnostics Within DPF, diagnostics is carried out based on three independent methods of analysis. We call them: (a) diagnostics based on classification (called ℜC ), (b) diagnostics based on feature/signal enveloping (called ℜ SPC ) and (c) diagnostics based on velocity threshold (called ℜV ). These three domains contribute to the overall diagnostics result. The diagnosis result is a number called
to cluster j than any other cluster in terms of the generalized statistical distance, the data point gets labeled an outlier to cluster j . Three different cases are considered here for diagnostics: (i) Point X new belongs to cluster Ci : If this criterion satisfies, we say that point is inside normal behaviors limit and diagnosis result is ‘normal’ ( rc = 0 ). (ii) Point X new is an outlier to Ci : Under this case, we say that point is outside the normal behavior limit, and hence, it is likely that the equipment behavior is ‘abnormal’ ( rc = 0.5 ). (iii) ‘m’ consecutive points are outliers: This case implies that the system is ‘abnormal’ with a high probability, and hence, the highest severity value in ℜC is assigned ( rc = 1 ). We typically choose m = 3 . 2) Diagnosis based on classification: For each of the features/signals, signal envelopes are constructed recursively using ± kσ limit. The actual expressions are based on equations shown in section 2.5. A new feature point x j is considered an outlier if | x j − µˆ x j | ≥ kσˆ x j . If among the n features, say n1 features fall beyond the ± kσ limits, then the severity value is set at rspc = n1 / n .
3430
TABLE I ESTIMATES AT THE END OF THE INITIALIZATION PHASE Estimate Dimension Explanation Original Means and variances of µX , ΣX original features / signals space ( R o )
3) Diagnosis based on velocity thresholds: Expression for calculating standardized velocity is shown in section 2.6. rv is assigned value of 1 if V > Vth or else rv is set to 0. Typically Vth is set to 10.
µC ,..., µC X
H. Prognostics As illustrated in Figure 1, the prognostics engine employs techniques similar to the diagnostics engine, however, the essential difference is that the inputs are based on forecast signals. Each feature/signal is considered as a time series xt . A univariate time series forecasting method is employed to predict values of xt +1 , xt + 2 , xt + 2 and xt + 4 . (In our implementation an autoregressive time series model of order 7, AR(7) , is fitted to xt and used for forecasting). The prognostics module calculates a severity rating on the forecasted observations like the diagnostics module would do. However, this severity rating is based only on classification and feature enveloping methods and not the velocity threshold method. III. DPF ALGORITHMS This section provides detailed input/output diagrams and flowcharts of the DPF software implementation algorithms. DPF software involves three distinct phases. First phase is called the ‘initialization phase’. Here, important estimates are made through dimensionality reduction and multi-basis clustering procedures. Once initialization phase is successfully executed, DPF software enters into an ‘optimization phase’; here optimal forgetting factors for exponential weighting are calculated. This phase is made optional in DPF, so that the user can choose to use a pre-set for the forgetting factor. Finally, DPF software enters the ‘execution phase’; here, it begins to perform on-line diagnostics and prognostics. Once the program enters the execution phase, it operates in an infinite loop; diagnostics/prognostics would be performed as new data is fed to the program. The user can terminate the program at any time.
A. Initialization phase in DPF: multi-basis clustering
1
X
k
ΣC X ,..., ΣC X
k
µC ,..., µC
k
1
Z
1
Z
ΣC Z ,..., ΣC Z 1
Z
B 1 ,...B
k
Z k
µC ,..., µC O 1
O
ΣC O ,..., ΣC O 1
The following steps are performed during the initialization phase:
Means and covariances of clusters in original space
Standardized space ( R s )
Means and covariances of clusters after normalizing
Standardized space ( R s )
PC-basis
PC space ( R PC )
k
k
Means and covariances of clusters in the space of two most dominant PCs
Step 3: Calculate the covariance matrix of Z . Denote it as Σ Z . Step 4: Using Σ Z calculate a PCA basis BZ for normalized data Z . Step 5: Calculate the principal components of Z . They are orthogonal transform of Z . Denote them by OZ . Step 6: Collect two dominant PCs from OZ , denote them as 1st OZ and 2 nd OZ . Step 7: Use an autonomous clustering algorithm to cluster [ 1st OZ , 2 nd OZ ] in 2D space. Let these clusters be C O1 , C O 2 ,...C O k . Step 8: Collect clustered indexes of the 2D dataset [ 1st OZ , 2 nd OZ ] using the 2D clustered dataset C O1 , C O 2 ,...C O k . Step 9: Use those indexes to cluster X . Denote them as C X 1 , C X 2 ,...C X k Step 10: Find mean and covariance of each cluster in the set [ C O1 , C O 2 ,...C O k ]. Denote them as µC O , µC O ,..., µC O and 1
2
k
ΣC O , ΣC O ,..., ΣC O respectively. 1
2
k
Step 11: Find mean and covariance of each cluster in the set [ C X 1 , C X 2 ,...C X k ]. Denote them as µC X , µC X ,..., µC X and 1
Initially data is collected from sensor(s) until N i (say 100) data points are available. They are transformed into feature vectors of dimension M . That gives us a matrix X of size N i × M . Hereafter this matrix is called the dataset throughout this phase.
Original space ( R o )
2
k
ΣC X , ΣC X ,..., ΣC X respectively. 1
2
k
Step 12: Use the set of indexes found in step 8 to cluster Z . Denote the clusters as C Z 1 , C Z 2 ,...C Z k . Step 13: Find covariance matrix for each clusters in the set [ C Z 1 , C Z 2 ,...C Z k ]. Denote them as ΣC Z , ΣC Z ,..., ΣC Z . 1
2
k
Step 14: Use ΣC Z , ΣC Z ,..., ΣC Z to find the PCA basis of each 1
Step 1: Calculate the mean and covariance of X . They are denoted as µ X and Σ X , respectively. Step 2: Normalize X using µ X and Σ X . Denote the normalized feature matrix as Z .
2
k
of clusters in the set [ C Z 1 , C Z 2 ,...C Z k ]. Denote those bases as B Z 1 , B Z 2 ,...B Z k .
3431
At the end of this phase, the following estimates are available. These estimates are saved for subsequent calculations
B. Optimization phase in DPF: Optimization of cluster tracking Procedure used for this phase is outlined below: Step 1: Collect N o points, starting from data point ( N o + 1) to data point ( N i + N o ) . Calculate the M dimensional feature vector from each point and store it in the form of a feature matrix Y of size N o × M . Step 2: For each of the N o points calculate the sum of generalized statistical distance for a particular α as follows No
D(α ) = ∑ (Yi −µt −1 (α ))T Σt −1 (α ) −1 (Yi − µt −1 (α )) i =1
Step 3: The optimal α for a recursive estimation of the set [ µ , Σ] is α * = min(D(α )) 0