entropy Article
Multi-Scale Permutation Entropy Based on Improved LMD and HMM for Rolling Bearing Diagnosis Yangde Gao 1 , Francesco Villecco 2 , Ming Li 3,4 and Wanqing Song 1, * 1
2 3 4
*
Joint Research Lab of Intelligent Perception and Control, School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, 333 Long Teng Road, Shanghai 201620, China;
[email protected] Department of Industrial Engineering, University of Salerno, Via Giovanni Paolo II 132, 84084 Fisciano, Italy;
[email protected] School of Information Science and Technology, East China Normal University, Shanghai 200241, China;
[email protected] Ocean College, Zhejiang University, Hangzhou 316021, China Correspondence:
[email protected] or
[email protected]; Tel.: +86-21-6779-1126
Academic Editor: Carlo Cattani Received: 8 January 2017; Accepted: 14 April 2017; Published: 19 April 2017
Abstract: Based on the combination of improved Local Mean Decomposition (LMD), Multi-scale Permutation Entropy (MPE) and Hidden Markov Model (HMM), the fault types of bearings are diagnosed. Improved LMD is proposed based on the self-similarity of roller bearing vibration signal by extending the right and left side of the original signal to suppress its edge effect. First, the vibration signals of the rolling bearing are decomposed into several product function (PF) components by improved LMD respectively. Then, the phase space reconstruction of the PF1 is carried out by using the mutual information (MI) method and the false nearest neighbor (FNN) method to calculate the delay time and the embedding dimension, and then the scale is set to obtain the MPE of PF1. After that, the MPE features of rolling bearings are extracted. Finally, the features of MPE are used as HMM training and diagnosis. The experimental results show that the proposed method can effectively identify the different faults of the rolling bearing. Keywords: improved LMD; multi-scale permutation entropy; MI; FNN; delay time; embedding dimension; HMM; back-propagation (BP); bearing fault diagnosis
1. Introduction Rolling bearings are the most important parts of rotating machinery, which are easily damaged by load, friction and damping in the course of operation [1]. Therefore, the feature extraction and pattern recognition of bearings diagnosis are very important. The time-frequency analysis method has been widely used in faults diagnosis, because it can provide the information in the time and frequency domain [2]. Moreover, there are many methods of artificial intelligence detection, such as statistical processing to sense [3], stray magnetic flux measurement [4], and neural networks such as support vector machine (SVM) [2]. When the wavelet basis function of the wavelet transform (WT) [5] is scheduled to be different, the decomposition of the time series will have great influence. When the wavelet base is selected, WT has no self-adaptability at different scales [2,6]. Empirical mode decomposition (EMD) [6] can involve the complex multi-component signal adaptive decomposition of the sum of several intrinsic mode function (IMF) components. Further, the Hilbert transform of each IMF component is used to obtain the instantaneous frequency [7,8] and instantaneous amplitude. The aim of this is to obtain the original signal integrity of the time-frequency
Entropy 2017, 19, 176; doi:10.3390/e19040176
www.mdpi.com/journal/entropy
Entropy 2017, 19, 176
2 of 10
distribution [9], but there are still some problems, such as: over-envelope, under-envelope [10], modal aliasing and endpoint effect in the EMD method. The Local Mean Decomposition (LMD) [2] method realizes signal decomposition and demodulation by constructing the local average and envelope estimation functions, which effectively solves the problem of the over-envelope, under-envelope and modal aliasing in the EMD method [6]. Compared with WT and EMD, LMD has the nature of self-adaptive feature and the LMD endpoint effect involves a certain degree of inhibition, while solving the problem of under-envelope and over-envelope. Researchers have found that the endpoint effect processing has been improved by LMD [11–13], but the impact of the endpoint effect is still very obvious, and it cannot be proved that extreme ones use the left or the right points, so we can not explain how the envelop curve fitting function of the extreme value may be reasonable [5]. Researchers have found that the LMD edge effects can be solved by the extreme point extension and the mirror extension [9]. The boundary waveform matching prediction method contains the Auto-Regressive and ARMA prediction [7]. Both sides of the waveform are just extended by these methods, but these methods do not take into account the inner rules or characteristics of signals. The new improved LMD takes inner discipline into account, the results of experiments show that the new method is more flexible than the existing method. After the improved LMD decomposition, we have to solve a major important aspect, namely, how to extract the fault information from the obtained PF components. Many studies have been carried out to solve the problem. In recent years, mechanical equipment fault diagnosis is used more than non-linear analysis methods such as sample entropy and approximate entropy [14–16]. Permutation Entropy (PE) [15–17] is a new non-linear quantitative description method, which can be a natural system of irregularities and a non-linear system in the quantitative numerical method presented, and has the advantages of simple calculation and strong anti-noise ability. However, similar to the traditional single-scale nonlinear parameters, PE is still in a single scale to describe the irregularity of time series. Aziz et al. [18] proposed the concept of MPE. MPE can be used to measure the complexity of time series under different scales, and MPE is more robust than other methods [18]. However, mechanical system signal randomness and noise interference are poor. The MPE method is used for signal processing, and the detection result is unsatisfactory. Therefore, MPE is used in combination with LMD, forming a new high-performance algorithm, which can effectively enhance the effectiveness of MPE in signal feature extraction. In this paper, the rolling bearing vibration signals were decomposed into several PF components by improved LMD. Then, the phase space reconstruction of the PF1 is carried out by using the mutual information (MI) and the false nearest neighbor (FNN) to calculate the delay time and the embedding dimension, and then the scale is set to obtain the MPE of PF1. After that, the feature vectors are extracted as the HMM and back-propagation (BP) neural network input, and then the training diagnosis and fault diagnosis are compared. 2. Improved LMD Method and Phase Space Reconstruction of MPE The process of improved LMD is summarized in four sub-processes, first we set up three points to invest a triangular waveform, and then we list all start points (it will takes some time to search the characteristic waveform), finally we get the best integration interval. After that we will study the shape error parameters to get the best extension of signal. Finally, we extend the right end through the above method, and the extended signal will be better. Figure 1 will show that the edge effect has become better, the end of the time-frequency representation changes significantly. Therefore, the experiment shows that the improved LMD has greatly improved the reduction of the border.
Entropy 2017, 19, 176 Entropy 2017, 19, 176
3 of 10 3 of 10
Entropy 2017, 19, 176
3 of 10
x(1) − m1 − n1
x(1) − m1 − n1
tm tn − tn1tmi txi = 1 i tm tm1tn tntn11tmi 1− i − txi =
e(i ) = e(i ) =
tm1 − tn1
mi − x(tmi ) n − mi + i tm − tx tn i i i − tmi m − x(tm ) n −m i
i
+
i
i
m −m tm1 i −− xtx(itm1 ) tni −ntm + 1 i 1 tmx1(tm − tx tm1 m1 − n1 tn − 1m− 1 )1 1 tm1 − tx1
+
tn1 − tm1
Figure 1. The process of the improved Local Mean Decomposition (LMD).
Figure 1. The process of the improved Local Mean Decomposition (LMD).
In Figure 2, the vibration signals of the rolling bearing are decomposed into several PF
In Figure2, 2, vibration signals the rolling bearing are decomposed into several components bythe the improved LMD Then, the phase space reconstruction thecomponents PF1 is PF In Figure the vibration signals ofrespectively. theof rolling bearing are decomposed into severalof PF components by the improved LMD respectively. Then, the phase space reconstruction of the PF1by is carried out by using the MI and the FNN to calculate the delay time and embedding dimension, and by the improved LMD respectively. Then, the phase space reconstruction of the PF1 is carried out then we can set the scale to obtain the MPE of PF1. carriedthe out using MItoand the FNN calculate the delay time and embedding using MIbyand the the FNN calculate the to delay time and embedding dimension, and dimension, then we canand set then we can set the scale to obtain the MPE of PF1. the scale to obtain the MPE of PF1. k x (t ) = PFp (t ) + u (t ) k p =1
k x (t ) = PFp (t ) + u (t ) k p =1
I(x, y) = h(x) + h(y) −h(x,y) il (l ) 1 y = x i = 1, 2, , n / l i l j= (i−1) l+1 j
il (l ) 1 y = x i = 1, 2, , n / l i l j= (i−1) l+1 j
I(x, y) = h(x) + h(y) −h(x,y) (r ) R2 ( n, r ) = R 2 ( n, r ) + x ( n + m t ) − x ( n + m t ) m +1 m
2
2
(r )
2
2
R ( n, r )of= Multi-scale R ( n, r ) + xPermutation ( n + m t ) − x (Entropy n + m t ) (MPE). Figure 2. LMD method and Phase Space Reconstruction m +1 m
In Figure 3, MI [17–20] determines the optimal delay time. x = {x i , i = 1, 2, , N } represents a group the probability density of the point is p x [ x ( i )] , the signal is mapped to the Figureof 2.signals, LMD method and Phase Spacefunction Reconstruction of Multi-scale Permutation Entropy (MPE). probability y = { y j , j = 1, 2,, N } .
In Figure Figure MI [17–20] determines the optimal delay time. x{ x = {, xi i = , i p=1, 1, x2,(·i The two groups of determines signals x, ythe are measured probability .represents In the ), j})]} represents xy [2, In 3,3,MI [17–20] optimal delay time. x =density · ·,y,N( N aa i the signal is mapped mapped to the the groupformula: of signals, probability density of correspond topoint of signal the entropy in theto h ( xthe ) and h ( y ) respectively x ( i )is )]j ),, the group of signals, the probability density function function of the the point isand ppxx[[xx((yii()] is specified system measure probability y y= = 2,· ·, N · ,}N.the. average amount of information; h( x, y) is a joint information = {y jj,, jjand = 1,1,2, entropy. The two groups of signals x, y are measured probability density p xy [ x (i ), y( j)]. In the formula: The two groups of signals x, y are measured probability density p xy [ x ( i ), y ( j )] . In the h( x ) and h(y) respectively correspond to x (i ) and y( j) of the entropy in the specified system and correspond formula: the h ( average x ) and amount h ( y ) respectively x (information i ) and y ( entropy. j ) of the entropy in the measure of information; h( x, y) is atojoint specified system and measure the average amount of information; h( x, y) is a joint information entropy.
Entropy 2017, 19, 176
4 of 10
Entropy 2017, 19, 176 Entropy 2017, 19, 176
4 of 10 4 of 10
Figure 3. Mutual information method. Figure 3. 3. Mutual information information method. method. Figure
In In Figure Figure 4, 4, FNN FNN [21–24] [21–24] is is used used to to calculate calculate the the minimum minimum embedding embedding dimension dimension effective effective In Figure 4, FNN [21–24] is calculate theaminimum embedding dimension effective method. method. Time series dimension space state space x (used n ) toconstructs method. Time series constructs a dimension space state space vector vector x(n) Time series x ( n ) constructs a dimension space state space vector Y ( n ) = ( x ( n ) , x ( n + t ) , · · · , x (n + r Y ( n ) = ( x ( n ), x ( n + t ), , x ( n + ( m − 1)t )) . Using Y r( n ) to represent the Y (n) nearest neighbor, the . Using to represent the nearest neighbor, r Y ( n ) = ( x ( n ), x ( n + t ), , x ( n + ( m − 1) t )) Y ( n ) ( n ) neighbor, the distance between them can the be (m − 1)t)). Using Y (n) to represent the Y (n) Ynearest distance between them can be calculated by the formula. distance between them can be calculated by the formula. calculated by the formula.
Figure 4. 4. False False nearest nearest neighbor neighbor method method for Embedding Dimension. Figure Figure 4. False nearest neighbor method for Embedding Dimension.
When coordinate is added ) added When the embedded dimensionchanges changesfrom from m m toto + ++1,11a,, anew to When the embedded dimension dimension changes from tom m m a new newcoordinate coordinateY (YYn()(nnis ) is added r to each component of the vector . At this time, the distance between and x (+ n+ mt ) At Y(n( n) )and Ynr)r( nis) the each component of the vector x ( n mt ) . this time, the distance between Y Y ( to each component of the vector x ( n + mt ) . At this time, the distance between Y ( n ) and Y ( n ) is is relative increment of the distance, so they are false neighbors. the relative increment of the distance, so they are false neighbors. the relative increment of the distance, so they are false neighbors. calculation the project RRtol ≥ 10, found that that the the false false neighboring neighboring points points can can be In , ititisis found In the the calculation of of the project Rtoltol ≥≥10 10 , it is found that the false neighboring points can be easily identified. At the time r = 1, we call the nearest neighbors, by computing each nearest neighbor easily we call call the the nearest nearest neighbors, neighbors, by by computing computing each each nearest nearest easily identified. identified. At At the the time time rr ==11 ,, we of the trajectory. neighbor of the trajectory. neighbor of the trajectory. In order order to make the MPE have better fault recognition effect, it is necessary to useuse the MI MI and the In In order to to make make the the MPE MPE have have better better fault fault recognition recognition effect, effect, itit is is necessary necessary to to use the the MI and and FNN to optimize the delay time and the embedding dimension when calculating the MPE in Figure 2. the the FNN FNN to to optimize optimize the the delay delay time time and and the the embedding embedding dimension dimension when when calculating calculating the the MPE MPE in in The MPE is calculated based on the optimized parameters. Figure Figure 2. 2. The The MPE MPE is is calculated calculated based based on on the the optimized optimized parameters. parameters. 3. Diagnosis Diagnosis Flow Based Based on HMM HMM 3. 3. Diagnosis Flow Flow Based on on HMM (1) The The characteristic characteristicindex indexofof bearing failure degree is extracted from the signal fault signal of the (1) bearing failure degree is extracted from the fault of the needle (1) needle The characteristic index of bearing failure degree is extracted from the fault signal of the needle roller bearingdifferent with different degrees of damage,the and the feature index is normalized roller bearing degrees of damage, index is and rollerquantized bearing with with different degrees ofestablished, damage, and and the feature feature index is normalized normalized and and [24]. When the HMM is the sequence of observations should be a quantized [24]. When the HMM is established, the sequence of observations should be a finite quantized [24]. When the HMM is established, the sequence of observations should be a finite finite discrete value,the and the discretized valuebecan be used as the model training eigenvalue discrete value, discrete value, and and the discretized discretized value value can can be used used as as the the model model training training eigenvalue eigenvalue after after after quantization. quantization. quantization. (2) Diagnosis Diagnosis FlowBased Based onHMM HMM [25]ininFigure Figure 5.The The Baum–Welch algorithm [26,27] is used (2) (2) Diagnosis Flow Flow Based on on HMM [25] [25] in Figure 5. 5. The Baum–Welch Baum–Welch algorithm algorithm [26,27] [26,27] is is used used to to to train, adjust and optimize the parameters of the observation sequence, so that the observed train, train, adjust adjust and and optimize optimize the the parameters parameters of of the the observation observation sequence, sequence, so so that that the the observed observed sequence of probability values in the the observed sequence is similarsimilar to the observed value sequence. sequence sequence of of probability probability values values in in the observed observed sequence sequence is is similar to to the the observed observed value value We calculate the maximum, HMM stateHMM identification, different fault levels fault of thelevels state to establish sequence. We calculate the maximum, state identification, different of the sequence. We calculate the maximum, HMM state identification, different fault levels of thestate state to to establish establish the the corresponding corresponding HMM, HMM, the the unknown unknown fault fault state state data data and and in in turn turn enter enter the the various various models, models, calculate calculate and and compare compare the the likelihood. likelihood. The The output output probability probability of of the the largest largest model model is is the the
Entropy 2017, 19, 176
5 of 10
Entropy 2017, 19, 176
5 of 10
the corresponding HMM, dataprobable and in turn the various models, unknown signal fault type.the It isunknown estimatedfault that state the most pathenter through the sequence is 5 of 10 calculate and compare the likelihood. The output probability of the largest model is the unknown observed by the Baum–Welch algorithm. unknown signal fault type. It is estimated that the most probable path through the sequence is signal fault type. It is estimated that the most probable path through the sequence is observed by observed by the Baum–Welch algorithm. the Baum–Welch algorithm.
Entropy 2017, 19, 176
P = ln P (O/ λ ) i i P = ln P (O/ λ ) i i
Figure 5. Diagnosis Flow Based on Hidden Markov Model (HMM). Figure 5. Diagnosis Flow Based on Hidden Markov Model (HMM). Figure 5. Diagnosis Flow Based on Hidden Markov Model (HMM).
4. Experimental Data Analysis 4. Experimental Data Analysis In Figure 6, in order to validate the proposed method, the experimental data of different rolling 4. Experimental Data Analysis bearings are analyzed. Thetoexperimental data are analyzed the Case Western University. In Figure 6, in order validate the proposed method,from the experimental dataReserve of different rolling In Figure 6, in order to validate the proposed method, the experimental data of different rolling The bearing of the drive end is thedata 6205-2RS SKF deep ballWestern bearings, and the loaded bearings are analyzed. Theshaft experimental are analyzed fromgroove the Case Reserve University. bearings are analyzed. The experimental data are analyzed from the Case Western Reserve University. 2.33 motor. Thedrive pitch shaft diameter ball groupSKF is 40deep mm groove and theball contact angle and is 0, the the loaded power The kW bearing of the end of is the 6205-2RS bearings, The bearing of the drive shaft end is the 6205-2RS SKF deep groove ball bearings, and the loaded of motor is 1.47 kW. Sampling frequency is 12 kHz and the relevant drive motor speed is 1750 rpm 2.33 kW motor. The pitch diameter of the ball group is 40 mm and the contact angle is 0, the power of 2.33 kW motor. The pitch diameter of the ball group is 40 mm and the contact angle is 0, the power with Hp load, numberfrequency of sampling n = the 2048. The collected signals canisbe1750 divided into motor2 is 1.47 kW.the Sampling is 12points kHz and relevant drive motor speed rpm with of motor is 1.47 kW. Sampling frequency is 12 kHz and the relevant drive motor speed is 1750 rpm four including Normal,points IRF, ORF, and The BF. Each condition a total of 2 Hpfault load,types the number of sampling n = 2048. collected signalshas can16 besamples divided and into four fault with 2 Hp load, the number of sampling points n = 2048. The collected signals can be divided into 64 samples. types including Normal, IRF, ORF, and BF. Each condition has 16 samples and a total of 64 samples. four fault types including Normal, IRF, ORF, and BF. Each condition has 16 samples and a total of 64 samples.
Figure Figure 6. 6. The The rolling rolling bearing bearing experiment experiment system. system. Figure 6. The rolling bearing experiment system. The normal state of the bearing, inner ring fault, outer ring fault and rolling body fault are tested. The normal state of the bearing, inner ring fault, outer ring fault and rolling body fault are tested. Due to limited space, only the improved LMD decomposition of the inner ring fault is listed in normal stateonly of the inner ring fault, outer ring fault andring rolling faultinare tested. Due toThe limited space, thebearing, improved LMD decomposition of the inner faultbody is listed Figure 7. Figure 7. We can find that improved LMD decomposition is performed for each group of time domain Due tofind limited onlyLMD the decomposition improved LMDisdecomposition of thegroup innerofring is listed in We can that space, improved performed for each timefault domain signal, signal, which sets the amount of change. The multi-component AM-FM signal is decomposed into a Figuresets 7. We find that improved LMD decompositionAM-FM is performed group of into timeadomain which thecan amount of change. The multi-component signalfor is each decomposed single single component AM-FM signal, the decomposition of the PF component and the signal correspond signal, which sets the amount change. The multi-component AM-FM signal is decomposed into component AM-FM signal, the of decomposition of the PF component and the signal correspond to thea to the corresponding components, which reflect the various components of the signal in the presence single component AM-FM signal, decomposition the PF component theinsignal correspond corresponding components, whichthe reflect the variousofcomponents of the and signal the presence of of different characteristics of components. Each PF component has a physical meaning, reflecting the to the corresponding components, whichEach reflect various components of the signal inreflecting the presence different characteristics of components. PFthe component has a physical meaning, the original signal of the real information, so we get a series of PF components. of different characteristics of components. component has a physical meaning, reflecting the original signal of the real information, so weEach get aPFseries of PF components. We can obtain the signal after improved LMD decomposition, which can effectively remove original signal of the real information, so we get a series of PF components. noise, and extract important signals. PF1 is more than 80% of the original information. PF1 can We can obtain the signal after improved LMD decomposition, which can effectively remove effectively reflect the original signal of the complete information characteristics. Therefore, we use noise, and extract important signals. PF1 is more than 80% of the original information. PF1 can the PF1 to calculate the MPE. Not only can we improve the computational efficiency, we can also effectively reflect the original signal of the complete information characteristics. Therefore, we use reflect the original information characteristics. the PF1 to calculate the MPE. Not only can we improve the computational efficiency, we can also reflect the original information characteristics.
Entropy 2017, 19, 176
6 of 10
Entropy 2017, 19, 176
6 of 10
Figure Figure 7. 7. Inner Innerfault faultisisdecomposed decomposed by by improved improved LMD. LMD.
According to the of theLMD MPE,decomposition, the phase space is reconstructed. Delay timenoise, and We can obtain thecalculation signal aftersteps improved which can effectively remove embedding are two main the MPEinformation. algorithm. It PF1 is determined that and extract dimension important signals. PF1 isparameters more thaninfluencing 80% of the original can effectively these two parameters have certain principal randomness. Therefore, we need algorithms for phase reflect the original signal of the complete information characteristics. Therefore, we use the PF1 to space reconstruction. Phase reconstruction methods mainly have thewe MIcan and thereflect FNN the to calculate the MPE. Not only space can we improve the computational efficiency, also determine two parameters independently and the (C-C) joint algorithm. We found that MPE obtained original information characteristics. by independent parameters has a steps betterofmutation important PF1 component According to the calculation the MPE,detection the phaseeffect. spaceSo is the reconstructed. Delay time and can be selected by the MI and the FNN to calculate the delay time and the embedding dimension. embedding dimension are two main parameters influencing the MPE algorithm. It is determined After that we reconstruct thehave phase space.principal randomness. Therefore, we need algorithms for that these two parameters certain We can set the MI parameter, Max_tau = 100, the program default maximum Partthe = 128 and phase space reconstruction. Phase space reconstruction methods mainly have thedelay MI and FNN to the program default box size. The delay time is calculated by the MI, as shown in Figure On the basis determine two parameters independently and the (C-C) joint algorithm. We found that 3. MPE obtained of delay time, false nearestdetection neighboreffect. method is important adopted toPF1 optimize the by determining independent the parameters has athe better mutation So the component embedding dimension, where the maximum embedding dimension is set to 12, the criterion 1 is set can be selected by the MI and the FNN to calculate the delay time and the embedding dimension. to 20, criterion 2 is set to 2, false neighbor rate with the embedding dimension of the curve as shown After that we reconstruct the phase space. in Figure 4. When the embedding dimension is 4, the FNN rate is no longer reduced with the increase We can set the MI parameter, Max_tau = 100, the program default maximum delay Part = 128 and of the embedding dimension, so the embedding dimension is set to m = 4. Using the same principle, we the program default box size. The delay time is calculated by the MI, as shown in Figure 3. On the can reconstruct the four faults of the rolling bearing with phase space, and calculate the delay time basis of determining the delay time, the false nearest neighbor method is adopted to optimize the and the embedding dimension. Due to the limited space, we only list part of the delay time and embedding dimension, where the maximum embedding dimension is set to 12, the criterion 1 is set to embedding dimension, and Table 1 reflects the partial delay time and the embedding dimension. 20, criterion 2 is set to 2, false neighbor rate with the embedding dimension of the curve as shown in Figure 4. When the embedding dimension is 4, the FNN rate is no longer reduced with the increase Table 1. The partial delay time and the embedding dimension. of the embedding dimension, so the embedding dimension is set to m = 4. Using the same principle, Section Normal we can reconstruct the four faults ofBF the rollingOBF bearing withIBF phase space, and calculate the delay time and the embeddingtdimension. Due to the limited space, we 1 1 1 only list part 2 of the delay time and embedding dimension, and the partial dimension. m Table 1 reflects 6 4 delay time 5 and the embedding 7 Table 1. The partial delay time and the embedding dimension.
After the phase space reconstruction, the delay time and embedding dimension are used to compute the MPE. We define the scale factor S and calculate the MPE, select the PF1 component of Section BF OBF IBF Normal the data length N = 2048, the scale S = 14. Therefore, we can calculate the MPE. In Figure 8, the MPE t 1 1 1 2 of PF2 is compared with MPE of PF1. We can see that PF1 is better than PF2, so we choose PF1. We m 6 4 5 7 can see that the rolling body faults and the inner ring faults are well distinguished, but the outer ring faultsAfter cannot bephase very good. there isthe a need to time identify model, it isdimension HMM. After MPE the spaceTherefore, reconstruction, delay andthe embedding arethe used to is calculated, the feature vectors are grouped into an Eigenvectors group for the input of HMM and compute the MPE. We define the scale factor S and calculate the MPE, select the PF1 component of the the neuralNnetwork, and trained andTherefore, predicted we for can different faults. dataBPlength = 2048, the scale S = 14. calculate the MPE. In Figure 8, the MPE of
Entropy 2017, 19, 176
7 of 10
PF2 is compared with MPE of PF1. We can see that PF1 is better than PF2, so we choose PF1. We can see that the rolling body faults and the inner ring faults are well distinguished, but the outer ring faults cannot be very good. Therefore, there is a need to identify the model, it is HMM. After the MPE is calculated, the feature vectors are grouped into an Eigenvectors group for the input of HMM and the BP neural network, and trained and predicted for different faults. Entropy2017, 2017,19, 19,176 176 Entropy
of10 10 77of
(a) (a)
(b) (b) Figure8.8.(a) (a)MPE MPEof offour fourfaults; faults;(b) (b)MPE MPEof ofPF2. PF2. Figure Figure 8. (a) MPE of four faults; (b) MPE of PF2.
Thetraining trainingcurve curveisisshown shownin inFigure Figure9.9.HMM HMMmodeling modelingand andtraining, training,rolling rollingbody bodyfault, fault,inner inner The The training curve is shown in Figure 9. HMM modeling and training, rolling body fault, ringfault, fault,outer outerring ringfault, fault,and andthe thenormal normalbearing bearingrepresent representfour fourkinds kindsof ofhidden hiddenstate, state,recorded recorded as ring as inner ring fault, outer ring fault, and the normal bearing represent four kinds of hidden state, recorded λ1,λ2, λ2,λ3, λ3,λ4, λ4,respectively. respectively.The Theobservation observationsequence sequenceof ofthe themodel modelisisthe thesix sixmulti-scale multi-scalepermutation permutation λ1, as λ1, λ2,feature λ3, λ4,sequences respectively. The observation sequence of the the is the six multi-scale permutation entropy feature sequences extracted extracted above. In In addition, themodel state transition transition probability matrix Bλ Bλ entropy above. addition, state probability matrix entropy feature sequences extracted above. In addition, the state transition probability matrix Bλ showsthe theinitial initialprobability. probability.The Thedistribution distributionvector vectorπλ πλisisgenerated generatedby bythe theRand Randrandom randomfunction, function, shows shows initialprocess probability. The distribution vector πλ is by the Rand function, andthe thethe training process normalized. TheMarkov Markov chain ingenerated HMMisisdescribed described byππrandom andA. A.Different Different and training isisnormalized. The chain in HMM by and the process is normalized. The HMM is described by π and A.the Different πand and Atraining determine itsshape. shape. Thefeatures features ofMarkov Markovchain chainin are asfollows: follows: must start from the initial πand A determine its The of Markov chain are as ititmust start from initial π and A determine its shape. The features of Markov chain are as follows: it must start from the state,along alongthe thestate statesequence, sequence,the thedirection directionof oftransfer transferincreases, increases,and andthen thenititstops stopsin inthe thefinal finalstate. state. state, initial state, along the state sequence, the direction of transfer increases, and then it stops in the final Themodel modelbetter betterdescribes describesthe thesignal signalin inaacontinuous continuousmanner mannerover overtime. time.The Thelevel levelof ofbearing bearingfailure failure The state. The model better describes the signal in a continuous manner over time. The level of bearing establishes HMM. HMM. Because Because there there are are four four kinds kinds of of fault fault state, state, we we choose choose four four HMM HMM states, states, and and the the establishes failure establishesN Because are four kinds ofas fault state, wesample chooseisisfour HMM states, model parameter parameter NHMM. chosen. Eachthere statetakes takes 11groups groups as thetraining training sample used togenerate generate model isis chosen. Each state 11 the used to and the model parameter Nremaining is chosen.sets Each state takes 11 groups asas the training sample is used HMM under four states,the the remaining sets offeature feature vectors areused used as test samples. Thetraining training HMM under four states, of vectors are test samples. The to generate HMM under four states, the remaining sets of feature vectors are used as test samples. algorithm used used isis the the Baum–Welcm Baum–Welcm algorithm. algorithm. We We set set the the number number of of iterations iterations to to 100 100 and and the the algorithm The training algorithm used isIn the Baum–Welcm algorithm. set the number of iterations to 100 and convergence error to to 0.001. 0.001. In the model of of training, training, theWe maximum likelihood estimation value convergence error the model the maximum likelihood estimation value the convergence error tonumber 0.001. In the model oftraining training,ends the maximum likelihood estimation value increases theiteration iteration number increases. The training ends whenthe theconvergence convergence errorcondition condition increases ssthe increases. The when error increases as the iteration number increases. The training ends when the convergence error condition satisfied. isissatisfied. is satisfied.
Figure 9. Number of HMM iterations. Figure9. 9.Number Numberof ofHMM HMMiterations. iterations. Figure
In order order to to prevent prevent the the model model from from getting getting into into an an infinite infinite loop loop or or training training model model failure, failure, the the In trainingalgorithm algorithmuses usesthe theLloyd Lloydalgorithm algorithmto toscalar scalarquantify quantifythe theentropy entropysequence, sequence,and andthe thescalar scalar training quantization sequence sequence isis input input into into the the HMM. HMM. The The training training algorithm algorithm uses uses the the Baum–Welch Baum–Welch quantization algorithm. As As the the number number of of iterations iterations increases, increases, the the maximum maximum log-likelihood log-likelihood estimate estimate increases increases algorithm. untilconvergence convergenceisisreached. reached.After Aftertraining, training,four fourhidden hiddenHMM HMMrecognition recognitionmodels modelscan canbe beobtained. obtained. until
Entropy 2017, 19, 176
8 of 10
In order to prevent the model from getting into an infinite loop or training model failure, the training algorithm uses the Lloyd algorithm to scalar quantify the entropy sequence, and the scalar quantization sequence is input into the HMM. The training algorithm uses the Baum–Welch algorithm. As the number of iterations increases, the maximum log-likelihood estimate increases until convergence is reached. After training, four hidden HMM recognition models can be obtained. Figure 9 shows the HMM training curves for four states. It can be seen that all the states reach convergence at 20 iterations and converge quickly. After the training HMM models, the remaining 20 sets of entropy eigenvectors, fives states of each state, are taken as test samples. Before the test, the sample entropy sequence is scaled by Lloyd algorithm, and the scaled quantized sample entropy sequences are inputted to each state. After the completion of HMM training in four states, a state classifier is built. The Eigenvalue sequences of the remaining four states is calculated by the Forward–Backward algorithm at λ1, λ2, λ3, and λ4 Model. The log-likelihood ln P(O|λ) of this sequence is generated. Log likelihood probability ln P(O|λ) reflects the degree of similarity between feature vector and HMM, the larger the value, the closer the observed feature sequence is to the HMM of the state. The feature sequence corresponds to the output log likelihood probability value. The maximum model corresponds to the fault state. In the HMM model, each model outputs a log likelihood estimate. The log likelihood estimate represents the degree of similarity between the feature vector and each HMM. The larger the log likelihood estimate, the closer the feature vector in this state HMM model, the recognition algorithm is the Baum–Welch Algorithm. The log likelihood ratio obtained from the training model inputted to this state is larger than the log likelihood probability value obtained from the other state inputs. In Table 2 it can be seen that the higher the failure degree of the same fault type, the higher the log likelihood estimate is, and the output of natural logarithm probability estimate in each state is the maximum under this state. Under these four states, the bearing diagnosis and recognition show no category error. The HMM-like fault diagnosis model can successfully identify the fault type of bearings, recognition accuracy and stability. Table 2. The part of the sample to be tested identifies the result. Logarithm Likelihood Probabilities of the Input Sample Model
Fault Condition
λ1
λ2
λ3
λ4
Recognition Result
BF OBF IBF Normal
−9.75854 −157.594 −∞ −∞
−24.5979 −15.1449 −55.9402 −24.0932
−∞ −∞ −9.33311 −59.6398
−∞ −∞ −792.054 −5.63077
λ1 λ2 λ3 λ4
In order to compare improved LMD model recognition results, the experiment is different. The training results of HMM are compared with those of BP neural network, the recognition results are shown in Table 3. We set BP some parameters as follows: trainParam_Show = 10; trainParam_Epochs = 1000; trainParam_mc = 0.9; trainParam_Lr = 0.05; trainParam_lrinc = 1.0; trainParam_Goal = 0.1. Compared with LMD, the overall accuracy of the improved LMD model is excellent. Table 3. Comparison of improved LMD for Fault Recognition. Recognition Model Improved LMD
LMD
BF
OBF
IBF
Normal
Recognition Rate
HMM
5
5
4
5
95.0%
BP
5
4
5
4
90.0%
HMM
5
4
4
5
90.0%
BP
5
4
4
4
85.0%
Entropy 2017, 19, 176
9 of 10
5. Conclusions Based on the combination of improved LMD, MPE and HMM model can still achieve higher recognition when the number of exercises is small. In practical application, the new useful method can be used for training samples, so the new model is more suitable for practical fault diagnosis. (1)
(2) (3)
Improved LMD for bearing non-stationary signal processing has a strong signal frequency domain recognition capability. The experiment shows that the improved LMD has greatly improved the reduction of the border. The improved LMD can effectively remove the excess noise and extract important information. The MI and the FNN can effectively reconstruct the space, reflecting the multi-scale permutation entropy and the mutation performance under different scales. HMM model of the various states of the bearings can be trained, and successfully diagnoses the bearing fault features. Improved LMD and HMM has a high recognition rate and it is very suitable for a large amount of information, and non-stationarity of the characteristic repeatability fault signal.
Acknowledgments: This project is supported by the Shanghai Nature Science Foundation of China (Grant No. 14ZR1418500) and the Foundation of Shanghai University of Engineering Science (Grand No. k201602005). Ming Li acknowledges the supports in part by the National Natural Science Foundation of China under the Project Grant No. 61672238 and 61272402. Author Contributions: Yangde Gao and Wanqing Song conceived and designed the topic and the experiments; Yangde Gao and Wanqing Song analyzed the data; Ming Li made suggestions for the paper organization, and Yangde Gao wrote the paper. Francesco Villecco made some suggestions both for the paper organization and for some improvements. Finally, Wanqing Song made the final guide to modify. All authors have read and approved the final manuscript. Conflicts of Interest: The authors declare no conflict of interest.
References 1. 2.
3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Li, T.; Xu, M.; Wang, R.; Huang, W. A fault diagnosis scheme for rolling bearing based on local mean decomposition and improved multiscale fuzzy entropy. J. Sound Vib. 2016, 360, 277–299. [CrossRef] Li, T.; Xu, M.; Wei, Y.; Huang, W. A new rolling bearing fault diagnosis method based on multiscale permutation entropy and improved support vector machine based binary tree. Measurement 2016, 77, 80–94. [CrossRef] Henao, H.; Capolino, G.A.; Fernandez-Cabanas, M.; Filippetti, F. Trends in fault diagnosis for electrical machines: A review of diagnostic techniques. IEEE Ind. Electron. Mag. 2014, 8, 31–42. [CrossRef] Frosini, L.; Harli¸sCa, C.; Szabó, L. Induction machine bearing fault detection by means of statistical processing of the stray flux measurement. IEEE Trans. Ind. Electron. 2015, 62, 1846–1854. [CrossRef] Immovilli, F.; Bellini, A.; Rubini, R.; Tassoni, C. Diagnosis of bearing faults in induction machines by vibration or current signals: A critical comparison. IEEE Trans. Ind. Appl. 2010, 46, 1350–1359. [CrossRef] Sun, H.; Zhang, X.; Wang, J. Online machining chatter forecast based on improved local mean decomposition. Int. J. Adv. Manuf. Technol. 2016, 84, 1–12. [CrossRef] Liu, W.T.; Gao, Q.W.; Ye, G.; Ma, R.; Lu, X.N.; Han, J.G. A novel wind turbine bearing fault diagnosis method based on Integral Extension LMD. Measurement 2015, 74, 70–77. [CrossRef] Smith, J.S. The local mean decomposition and its application to EEG perception data. J. R. Soc. Interface 2005, 2, 443–454. [CrossRef] [PubMed] Li, Y.; Xu, M.; Zhao, H.; Yu, W.; Huang, W. A new rotating machinery fault diagnosis method based on improved local mean decomposition. Digit. Signal Process. 2015, 46, 201–214. [CrossRef] Lei, X.; Xun, L.; Chen, J.; Alexander, H.; Su, H. Time-varying oscillation detector base Don improved robust l EMP El-Z IV complexity. Control Eng. Pract. 2016, 51, 48–57. Liu, W.Y.; Zhou, L.Q.; Hu, N.N.; He, Z.Z.; Yang, C.Z.; Jiang, J.L. A novel integral extension LMD method based on integral local waveform matching. Neural Comput. Appl. 2016, 27, 761–768. [CrossRef] Wei, G.; Huang, L.; Chen, C.; Zou, H.; Liu, Z. Elimination of end effects in local mean decomposition using spectral coherence and applications for rotating machinery. Digit. Signal Process. 2016, 55, 52–63.
Entropy 2017, 19, 176
13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
10 of 10
Shi, Z.; Song, W.; Taheri, S. Improved LMD, permutation entropy and optimized k-means to fault diagnosis for roller bearings. Entropy 2016, 18, 70. [CrossRef] Yang, Z.-X.; Zhong, J.H. A hybrid eemd-based sampen and svd for acoustic signal processing and fault diagnosis. Entropy 2016, 18, 112. [CrossRef] Zhang, X.; Liang, Y.; Zhou, J.; Zang, Y. A novel bearing fault diagnosis model integrated permutation entropy, ensemble empirical mode decomposition and optimized SVM. Measurement 2015, 69, 164–179. [CrossRef] Rohit, T.; Gupta, K.V.; Kankar, P.K. Bearing fault diagnosis based on multi-scale permutation entropy and adaptive neuro fuzzy classifier. J. Vib. Control 2015, 21, 461–467. Fan, C.-L.; Jin, N.-D.; Chen, X.-T.; Gao, Z.-K. Multi-scale permutation entropy: A complexity measure for discriminating two-phase flow dynamics. Chin. Phys. Lett. 2013, 30, 90501–90505. [CrossRef] Zhao, L.-Y.; Wang, L.; Yan, R.-Q. Rolling bearing fault diagnosis based on wavelet packet decomposition and multi-scale permutation entropy. Entropy 2015, 17, 6447–6461. [CrossRef] Mohammad, A.S.; Ralf, R.M.; Rachinger, C. Phase modulation on the hypersphere. IEEE Trans. Wirel. Commun. 2016, 15, 5763–5774. Ghasemzadeha, H.; Khassb, M.T.; Arjmandic, M.K.; Pooyanda, M. Detection of vocal disorders based on phase space parameters and Lyapunov spectrum. Biomed. Signal Process. Control 2015, 22, 135–145. [CrossRef] Rakshith, R.; Hanzo, L. Hybrid beamforming in mm-wave MIMO systems having a finite input alphabet. IEEE Trans. Commun. 2016, 64, 3337–3349. Vignesh, R.; Jothiprakash, V.; Sivakumar, B. Streamflow variability and classification using false nearest neighbor method. J. Hydrol. 2015, 531, 706–715. [CrossRef] Hyun, G.; Kim, J. A simple and efficient finite difference method for the phase-field crystal equation on curved surfaces. Comput. Methods Appl. Mech. Eng. 2016, 307, 32–43. Wu, C.L.; Chau, K.W. Data-driven models for monthly stream flow time series prediction. Eng. Appl. Artif. Intell. 2010, 23, 1350–1367. [CrossRef] Zhou, H.; Chen, J.; Dong, G.M.; Wang, R. Detection and diagnosis of bearing faults using shift-invariant dictionary learning and hidden Markov model. Mech. Syst. Signal Process. 2015, 72–73, 65–79. [CrossRef] Yuwono, M.; Qin, Y.; Zhou, J.; Guo, Y.; Celler, B.G.; Su, S.W. Automatic bearing fault diagnosis using particles warm clustering and hidden Markov model. Eng. Appl. Artif. Intell. 2016, 47, 88–100. [CrossRef] Jiang, H.; Chen, J.; Dong, G. Hidden Markov model and nuisance attribute projection base bearing performance degradation assessment. Mech. Syst. Signal Process. 2016, 72–73, 184–205. [CrossRef] © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).