2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006
Hidden Markov Models and Gaussian Mixture Models for Bearing Fault Detection Using Fractals T. Marwala, U. Mahola and F. V. Nelwamondo Abstract— Bearing vibration signals features are extracted using time domain fractal based feature extraction technique. This technique uses Multi-Scale Fractal Dimension (MFD) estimated using Box-Counting Dimension. The extracted features are then used to classify faults using Gaussian Mixture Models (GMM) and hidden Markov Models (HMM). The results obtained show that the proposed feature extraction technique does extract fault specific information. Furthermore, the experimentation shows that HMM outperforms GMM. However, the disadvantage of HMM is that it is computationally expensive to train compared to GMM. It is therefore concluded that the proposed framework gives enormous improvement to the performance of the bearing fault detection and diagnosis, but it is recommended to use the GMM classifier when time is the major issue.
R
I. INTRODUCTION
machineries are widely used in industry, for system maintenance and even process automation. Research shows that most of the failures of these machines are often linked with bearing failures [1]. Bearing faults induce high bearing vibrations which generate noise that may even cause the entire rotating machinery such as motors to function incorrectly. Thus, it is important to include bearing vibration fault detection and diagnosis in industrial motor rotational fault diagnosis system [1]. As a result, there is a high demand for cost effective automatic monitoring of bearing vibrations in industrial motor rotational system. A variety of fault bearing vibration feature detection techniques have been proposed which can be classified into three domains, which are: frequency domain analysis, timefrequency domain analysis and time domain analysis [2]. The frequency domain methods often involve frequency analysis of the vibration signals and looks at the periodicity of high frequency transients. This procedure is complicated by the fact that this periodicity may be suppressed [2]. The most commonly used frequency analysis technique for detection and diagnosis of bearing fault is the envelope analysis. More details on this technique are found in McFadden and Smith [3]. The main disadvantage of the frequency domain analysis is that it tends to average out transient vibrations and therefore becomes more sensitive to background noise. To overcome this problem, the timeOTATING
Manuscript received November 30, 2005. T. Marwala is a professor of Electrical Engineering at the school of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa. (phone: +27 11 717 7217; fax: +27 11 403 1929; e-mail:
[email protected]). U. Mahola and F. V. Nelwamondo are both also with the school of Electrical and Information Engineering at the University of the Witwatersrand Johannesburg, South Africa. (e-mail:
[email protected] [email protected]).
0-7803-9490-9/06/$20.00/©2006 IEEE
frequency domain analysis is used since it shows how the frequency contents of the signal changes with time. Examples of such analyses are; Short Time Fourier Transform (STFT), the Wigner-Ville Distribution (WVD) and most importantly, the Wavelet Transform (WT). These techniques are studied in detail in [4]. The last category of the feature detection is the time domain analysis. There are a number of time domain methods which give reasonable results such as time series averaging method, signal enveloping method, Kurtosis method and many more [4]. Research shows that unlike the frequency domain analysis, time domain analysis is less sensitive to suppressions of the impact periodicity [2],[4]. This paper introduces a new time domain analysis method which was originally used in image processing and has been recently used in speech recognition, which is fractal dimension analysis method [5]-[7]. This fractal dimension analysis is the Multi-scale Fractal Dimensions (MFD) of short-time bearing vibration segments, derived from nonlinear theory [7]. It is more suitable to bearing fault detection and diagnosis since it extracts non-linear vibration features of each bearing fault. The general procedure on bearing fault diagnosis is to extract features from the bearing vibration signal using one or more of the three domains mentioned above. These features can then be used for automatic motor bearing fault detection and diagnosis by applying them to a non-linear pattern classifier. The most popular classifier used in bearing fault detection is Neural Networks (NN). However, other non-linear classifiers like Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) have shown to outperform NN in a number of classification problems but mostly in speech related problems. Only, recently researchers like Purushothama et al. [8] have applied speech pattern classifiers such as HMM to fault detection of mechanical systems due to their success in speech recognition. Since no research has been carried out previously on GMM for bearing fault detection and diagnosis, this paper conducts a comparative study of HMM and GMM and presents time domain analysis based techniques using fractals to extract the features. Furthermore, the ability of MFD to detect bearing faults is evaluated using both HMM and GMM non-linear pattern classifiers. The rest of the paper is arranged as follows: The next section presents in detail different bearing faults studied in this paper followed by the mathematical background of fractal dimensions, HMM and GMM. Thereafter, the proposed time domain bearing detection and diagnosis framework is presented.
5876
II. MOTOR BEARING FAULTS Vibration measurement is important in an advanced conditioning monitoring of mechanical systems. Most bearing vibrations are periodical movements. The geometry of the bearing is shown in Fig. 1.
Minkowski-Bouligand dimension and Box-Counting dimension [6]. In this study, fractal dimension is approximated using the Box-Counting dimension, which is discussed further in the next section. A. Box-Counting Dimension The Box-Counting dimension (DB) of, F, is obtained by partitioning the plane with squares grids of side ε, and N(ε) number of squares that intersect the plane and is defined as [10] ln N (ε ) (2) D B ( F ) = lim ε → 0 ln(1 / ε ) Assuming a discrete bearing vibration signal, s1 , s2 ,..., sT then DB is given by [7]
Fig.1. Ball bearing geometry with diameter, D [4]
Generally, the rolling bearing contains two concentric rings, which are called the inner and outer raceway, as shown in this figure [4]. Furthermore, the bearing contains a set of rolling elements that run in the tracts of these raceways. There are number of standard shapes of the rolling elements such as the ball, cylindrical roller, tapered roller, needle rollers, symmetrical and unsymmetrical barrel roller and many more [9]. In this paper, a ball rolling element is used. Fig. 1 also shows the cage, which ensures uniform spacing and prevents mutual contact. There are three faults that are studied herein, which are the inner raceway fault, outer raceway fault and the rolling element fault. A bearing fault increases the rotational friction of the rotor and therefore, each fault generates vibration spectra with unique frequency components [2]. It should be noted that these frequency components are a linear function of the running speed. Additionally, the two raceway frequencies are also linear functions of the number of balls. The motor bearing conditioning monitoring systems is implemented by analyzing the vibration signal of all the bearing faults. The vibration signal is produced by the impact pulse generated when a ball roller hits a defect in the raceways or each and every time the defect in the ball hits the raceways [4]. III. FRACTAL DIMENSION Most of the motor bearing vibrations are periodic movements with some degree of turbulence and therefore to be able to detect different bearing faults these non-linear turbulence features must be extracted. The non-linear turbulence features of the bearing signal are quantified using fractal model [5]. To define the fractal dimension, let the continuous real-valued function, s(t ),0 ≤ t ≤ T represents a short-time vibration signal. Furthermore, let the compact planar set [6] (1) F = {(t, s(t) ∈ R2 : 0 ≤ t ≤ T} , represent the graph of this function. The fractal dimension of compact planar set F is called the Hausdorff dimension and it is generally between one and two [6]. The problem with this dimension is that it is only a mathematical concept and therefore it is extremely hard to compute. Due to this, other methods are used to approximate this dimension such as the
J J J DB (F ) = J . ∑ln(1/ ε j ).ln(N(ε )) − ∑ln(1/ ε j ) . ∑ln N (ε ) / j =1 j =1 j =1 (3) 2 J J 2 J .∑(ln(1/ ε j )) − ∑ln(1/ ε ) j =1 j =1
where J is the computation resolutions as expained in [7] and ε min ≤ ε j ≤ ε max with ε max and ε min represent the maximum and minimum resolutions of computation. In (3), DB is equal to the slope obtained by fitting a line using least squares method [6]. B. Multi-Scale Fractal Dimension (MFD) It should be noted that the fractal dimension discussed in the last section are a global measure and therefore do not represent all the fractal characteristics of the vibration signal [7]. To overcome this problem of information limitation caused by the global fractal, the Multi-scale Fractal Dimension set is created. The MFD ( D ( s, t ) ) of the vibration signal, s , is obtained by computing the dimensions over a small time window. This MFD set is obtained by dividing the bearing vibration signal into K frames, then K maximum computation resolutions are set as [7] : (4) ε kmax = k.ε min (1 ≤ k ≤ K ) where ε min is the same as before, which is the minimum valid resolution of the computation. The Box-Counting dimension in equation 3 can then be written as [10] k k k D k ( F ) = k. ∑ ln(1 / jε min ). ln( N ( jε min )) − ∑ ln(1 / jε min ) . ∑ ln N ( jε min ) / j =1 j =1 j =1
(5)
2 k k 2 k.∑ (ln(1 / jε min )) − ∑ ln(1 / jε min ) j =1 j =1
Finally, the corresponding MFD of the vibration signal is given by [7]:
{
(6) } where, Dk(s) is the fractal dimension of the kth frame and this is called the fractogram [6].
MFD( s ) = D1 ( s), D 2 ( s),...., D K ( s)
5877
IV. HIDDEN MARKOV MODEL (HMM)
V. GAUSSIAN MIXTURE MODEL (GMM)
HMM is a stochastic signal model and is referred to as Markov sources or probabilistic functions of Markov chains [11]. This model has been mostly applied to speech recognition systems and only recently it has been applied to bearing fault detection. Recent examples will include among others, and Purushothama et al [8] and Baruah and Chinnam [11]. In Hidden Markov Models the observation is a probabilistic function of the state and this means the resulting model is a doubly emended stochastic process with the underlining stochastic process that is not observable [11]. However, this process can only be observed through another set of stochastic process that produces the sequence. There are a number of possible Markov models but the left-to-right or barks model is usually used in speech recognition, and is also used in this study. Hidden Markov Model are defines as
GMM non-linear pattern classifier also works by creating a maximum likelihood model for each motor bearing fault given by [13],
λ = { A, B, π }
(7)
where λ is the model, A = {aij } , B = {bij (k )} and π = {π i } is a transition probability distribution, the observation probability distribution and initial state distribution, respectively. These parameters of a given state, Si , are defined as [8] (8) aij = P(qt +1 = S j | qt = S i ),1 ≤ i, j ≤ N
bij ( k ) = P (ok | qt = S i ),1 ≤ j ≤ N ,1 ≤ k ≤ M
(9)
and (10) π i = P( q1 = S i )),1 ≤ i ≤ N where qt is the state at time t and N denotes the number of states. Furthermore, ok is the kth observation and M is the number of distinct observation. There are three basic problems to be solved for this model to be able to be used practically. Firstly, evaluation, which finds the probability of the observation sequence, O = o1 , o2 ,..., oT , of visible states generated by the model λ. Using the model in (7), the probability is computed as [11] P (O , λ ) =
∑π allS
T =1 S0
∏ a S y S t +1 b S t + 1 ( o S t +1 )
λ= { w, µ, Σ }
where λ is the model, w, µ, Σ are the weights, means and diagonal covariance of the features. Given a collection of training vectors, the parameters of this model are estimated by a number of algorithms such as the ExpectationMaximization (EM) algorithm, K-means algorithm and many more [13]. In this study, the EM algorithm is used since it has reasonable fast computational time when compared to other algorithms. The EM algorithm finds the optimum model parameters by iteratively refining GMM parameters to increase the likelihood of the estimated model for the given bearing fault feature vector. More details on the EM algorithm for training a GMM are in [14]. The bearing fault detection or diagnosis using this classifier is then achieved by computing the likelihood of the unknown vibration segment of different fault models. This likelihood is given by [12] sˆ = arg max
1≤ f ≤ F
by using the Forward algorithm. Secondly, decoding, which finds a state sequence that maximizes probability of observation sequence and this can be realized by the socalled Viterbi algorithm [12]. Lastly, training which adjusts model parameters to maximize probability of observed sequence. This last step is simply a problem of determining the reference model for all bearing faults. A more detailed explanation of HMM training using the Baum-Welch reestimation along with other features of Hidden Markov Model is presented in [12].
K
∑ log p(x
k
|λf )
(16)
k =1
where, F, represent the number of faults to be diagonalized, X = {x1 , x 2 ,..., xK } is the unknown D-dimension bearing fault vibration segment and p(xk|λf) is the mixture density function given by [12] p (x | λ ) =
M
∑
(17)
wi p(x )
i =1
with, pi ( xt ) =
1 ( 2π ) D / 2
∑
exp{ − i
(18)
1 ( x k − µ i ) T ( ∑ i ) −1 ( x k − µ i )} 2
The mixture weights, wi, satisfy the constrains, M
∑w
i
=1
(19)
i =1
VI. PROPOSED BEARING MONITORING FRAMEWORK
(11)
t=0
(15)
The architecture of the proposed motor bearing fault detection and diagnosis framework is shown in Fig. 2. As shown in this figure, the framework consists of two major stages after vibration signal measurements, which are: preprocessing with feature extraction stage and classification stage. These stages are discussed in more detail in the next section. A. Pre-processing and Feature Extraction The first essential stage of any automatic fault detection and diagnosis system is signal preprocessing and feature extraction. As discussed in Section II, defects cause a change in the machinery vibration levels and therefore, information concerning the health or status of the monitored machine is largely contained in the vibration time signal [14]-[15]. The
5878
signal is preprocessed by firstly dividing the vibration signals into, T, windows of equal lengths. For this method to work adequately, it should be noted that the width of the window must be more than one revolution of the bearing to ensure that the uniqueness of each vibration fault signal is captured. The preprocessing is followed by extraction of features of each window using the Box-Counting MFD, which forms the observation sequence to be used by GMM or HMM classifier, as discussed in Section 3. The time domain analysis extracts the non-linear turbulence information of the vibration signal and is expected to give enormous improvement to the performance of the bearing fault detection and diagnosis.
Retain the largest eigenvalues and corresponding eigenvectors which contains at least 90% of the data. 4. Project the original data onto the reduced eigenvectors and thus reduce the dimension of the data. For more information on PCA used here to reduce the dimension of the feature space the reader is referred to Jolliffe [16]. Fig. 2 also shows that both GMM and HMM work by building models for all possible faults types and the normal condition as discussed in Sections IV and V. Diagnosis of the motor bearing fault is achieved by calculating the probability of the feature vector, given the entire previously constructed fault model. GMM or HMM with maximum probability then determines the bearing condition. 3.
VII. EXPERIMENTATION This section discusses the experimentation database used to evaluate the efficiency of the proposed approach. Performance measure adopted during experimentation is also briefly discussed.
Fig. 2. Proposed Motor Bearing Fault detection and Diagnosis system
B. Classification Methods Due to the large variations and the dynamic nature of the vibration signal, direct comparison of the signals is difficult. Hence, non-linear pattern classification methods are used to classify different bearing fault condition. The feature extracted serves as input to the classification stage of the framework. This paper investigates the performance of GMM and HMM classifier. For GMM classifier, Principal Component Analysis (PCA) is applied to the feature vector before training to reduce dimension and remove redundant information [16]. This is aimed at reducing the complexity of the model. The main idea behind PCA is to determine the features that explain as much of the total variation in the data as possible with as few of these features as possible. The computation of the PCA data transformation matrix is based on the eigenvalue decomposition. The computation of the principal components is as follows [16]: 1. Calculate the covariance matrix of the input data. 2. Compute the eigenvalues and eigenvectors of the covariance matrix.
A. Database The database used to validate new bearing fault diagnosis discussed in the last section has been developed at Rockwell Science Centre by Loparo [17]. In this database, single point faults of diameters of 7 mils, 14 mils and 21 mils (1 mil=0.001 inches) were introduced using electrodischarge machining. These faults were introduced separately at the inner raceway, rolling element (i.e. ball) and outer raceway. A more detailed explanation of this database is presented in [17]. The experiments were performed for each fault diameter and this was repeated for two load conditions, which are 1 and 2 horsepower. The experimentation was performed for vibration signals sampled at 12,000 samples per second for the drive end bearing faults. The vibration signals from this database were all divided into equal windows of four revolutions. Half of the resulting sub-signals are used for training and the other half are used for testing. VIII. RESULTS The optimum HMM architecture used for experimentation is a 2 state model with diagonal covariance matrix that contains 10 Gaussian mixtures. GMM architecture also used diagonal covariance matrix with 3 centers. The main advantage of using the diagonal covariance matrix in both cases is that this de-correlates the feature vectors. This is necessary since fractal dimensions are highly correlated values [6]. The first set of experiments measures the effectiveness of the time domain fractal dimension based feature extraction using vibration signal of the faults shown in Fig. 3. The figure shows the first 2 seconds of the vibration signals used and it can be clearly seen that there are fault specific information which must be extracted.
5879
100
HMM GMM
Clas sification Acc uracy
90 Optimum size for GMM
Optimum size for HMM
80
70
60
50
40
30
2
4
6
8
10 12 MFD size
14
16
18
20
Fig.5. The graph of the change classification rate with change in MFD size Fig.3 The first 2 seconds of the vibration signal of the Normal, Inner Raceway fault, Ball fault and Outer Raceway fault
Fig. 4 below shows the MFD feature vector which extracts the bearing fault specific information. It should be noted that these features are only for the first second of the vibration signal. Fig. 4 clearly shows that the proposed feature extraction technique does indeed extract the fault specific features which are used to classify different bearing faults. For this reason, the proposed MFD feature extraction is expected to give enormous improvement to the performance of the bearing fault detection and diagnosis. However, the optimum size of the MFD must be initially found.
Using optimum HMM and GMM architecture discussed previously, the classification accuracy obtained for different bearing loads and different bearing fault diameters is presented in Table 1 for both GMM and HMM classifier. The table shows that the HMM outperforms GMM classifier for all cases, with 100% and 99.2% classification for HMM and GMM, respectively. Table 1 also shows that changing the bearing load or diameter does not significantly change the classification rate. Table1: The classification rate for different loads and fault diameters for both the GMM and HMM classifier
1.8 Normal Condition
1.7
Inner Raceway Fault
Ball Fault
Outer Raceway Fault
Using a Pentium IV with 2.4GHz processor speed, further experimentation showed that the average training time of HMM is 19.5 seconds and this is more that 20 times higher than GMM training time, which was obtained to be 0.83 seconds. In summary, even though HMM gives higher classification rate when compared to GMM it is time consuming to train the models when compared to GMM. It is probably worth mentioning that, it was observed that using the PCA dimension reduction technique discussed in Section 6, does not affect the classification rate. However, this reduced the dimension from 84 to 11 and making GMM training even more computationally efficient when compared to training HMM.
Box-counting MFD
1.6 1.5 1.4 1.3 1.2 1.1 1
0
0.1
0.2
0.3
0.4 0.5 Normalized Time
0.6
0.7
0.8
0.9
Fig.4 MFD feature extraction comparison for the normal, inner, outer and ball fault for the 1s vibration signal.
The graph of change of the system accuracy with the change of the MFD size is shown in Fig. 5. This figure shows that the GMM generally has a large optimum MFD size of 13 compared to 5 for HMM.
IX. DISCUSSION In this paper results obtained using the MFD short time feature extractions were presented. The results of the effect of the proposed time domain feature extraction demonstrate that this technique extracts the fault specific features. The results further show that for GMM classifier using PCA, the classification rate is not affected it simply reduces the dimension of the input feature vector which makes GMM models less complex than HMM models. Further experimentation revealed that there is an optimum MFD size
5880
which gives the optimum classification rate. From the results obtained it was found that GMM generally has larger optimum MFD size than HMM. The second set of test performed compared the performance of GMM and HMM in classifying the different bearing faults. The test revealed that the HMM outperforms GMM classifier with classification rate of 100%. Further testing of these classifiers revealed that, the major disadvantage of HMM classifier is that it takes longer to train than GMM, even though GMM has larger MFD size than HMM. Due to this it is recommended to use the GMM classifier when time is the major issue in that particular application. It was further observed that changing the bearing load or diameter does not significantly affect the classification rate of the proposed framework X. CONCLUSION AND FUTURE WORK
[11] P. Baruah, R. B. Chinnam, “HMMs for diagnostic and prognostics in machining process”, International Journal of Production Research, vol. 43, pp 1275-1293, 2005. [12] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition”, Proceedings of IEEE, vol. 77, pp. 257-286, 1989. [13] A. Dempster, N. Laird, D Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of Royal Statistical Society, vol. 39, pp. 1-38, 1977. [14] Y.F. Wang, P.J. Kootsookos, “Modeling of low shaft speed bearing faults for condition monitoring”, Mechanical Systems and Signal Processing, vol. 12, pp. 415-426, 1998. [15] K. McClintic, M. Lebold, K. Maynard, C. Byington, R. Campbell, “Residual and difference feature analysis with transitional gearbox data”, Proceedings of the 54th Meeting of the Society for Machinery Failure Prevention Technology, Virginia Beach, VA, pp. 635-645, 2000. [16] I.T Jolliffe, Principal component analysis, Springer-Verlang, New York, 1986. [17] K.A. Loparo, Bearing data center seeded fault test data, http://www.eecs.case.edu/-laboratory/bearing/download.htm, last accessed 21 November 2005.
The framework was presented in this paper that uses time domain fractal based feature extraction, to extract the nonlinear turbulent information of the vibration signal. Using these features together with HMM and GMM classifiers, the results showed that HMM classifier outperforms GMM classifier with HMM giving 100% and GMM 99.2% classification rate. However, the major drawback of HMM classifier is that it is computationally expensive, taking 20 times more than the GMM to train. REFERENCES [1]
X. Lou, K A. Loparo, F. M. Discenzo, J. Yoo, A. Twarowski. “A model-based technique for rolling element bearing fault detection’’,Mechanical Systems and Signal Processing, vol. 18, pp. 1077–1095, 2004. [2] S. Ericsson, N. Grip, E. Johansson, L.E. Persson, R. Sjöberg, J.O. Strömberg, “Towards automatic detection of local bearing defects in rotating machines”, Mechanical Systems and Signal Processing, vol. 19, pp. 509-535, 2004. [3] P. D. McFadden, J. D. Smith, “Vibration monitoring of rolling element bearings by high frequency resonance technique – a review”, Tribology International, vol. 77, pp. 3-10, 1984. [4] B. Li, M.Y Chow, Y. Tipsuwan, and J. C. Hung, “Neuralnetworkbased motor rolling bearing fault diagnosis’’, IEEE Transactions on Industrial Electronics, vol. 47, pp. 1060-1068, 2000. [5] P. Maragos and F.K Sun, “Measuring the fractal dimension of signals: Morphological covers and iterative optimization”, IEEE Transaction on Signal Processing, vol.41, pp. 108-121, 1993. [6] P. Maragos and A. Potamianos, “Fractal dimensions of speech sounds: computation and application to automatic speech recognition”, Journal of the Acoustical Society of America, vol.105, no 3, pp. 19251932, 1999. [7] F. Wang, F. Zheng, W. Wu, “A C/V segmentation for Mandarin speech based on multi-scale fractal dimension”, InternationalConference on Spoken Language Processing, vol. 4, pp. 648-651, 2000. [8] V. Purushothama, S. Narayanana, Suryana-rayana, A.N. Prasadb, “Multi-fault diagnosis of rolling bearing elements using wavelet analysis and hidden Markov model based fault recognition”, NDT&E International, vol. 38, pp. 654–664, 2005. [9] H. Ocak, K. A. Loparo, “Estimation of the running speed and bearing defect frequencies of an induction motor from vibration data”, Mechanical Systems and Signal Processing, vol. 18, pp. 515–533, 2004. [10] K. Falconer, Fractal geometry; mathematical foundations and application, New York: John Wiley; 1952.
5881