Study on Mutual Information and Fractal Dimension-Based ... - MDPI

entropy Article

Study on Mutual Information and Fractal Dimension-Based Unsupervised Feature Parameters Selection: Application in UAVs Xiaohong Wang 1 1 2 3

*

ID

, Yidi He 1 and Lizhi Wang 2,3, *

School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China; [email protected] (X.W.); [email protected] (Y.H.) Institute of Unmanned System, Beihang University, Beijing 100191, China Key Laboratory of Advanced Technology of Intelligent Unmanned Flight System of Ministry of Industry and Information Technology, Beihang University, Beijing 100191, China Correspondence: [email protected]; Tel.: +86-138-1043-8269

Received: 16 July 2018; Accepted: 3 September 2018; Published: 5 September 2018

Abstract: In this study, due to the redundant and irrelevant features contained in the multi-dimensional feature parameter set, the information fusion performance of the subspace learning algorithm was reduced. To solve the above problem, a mutual information (MI) and fractal dimension-based unsupervised feature parameters selection method was proposed. The key to this method was the importance ordering algorithm based on the comprehensive consideration of the relevance and redundancy of features, and then the method of fractal dimension-based feature parameter subset evaluation criterion was adopted to obtain the optimal feature parameter subset. To verify the validity of the proposed method, a brushless direct current (DC) motor performance degradation test was designed. Vibrational sample data during motor performance degradation was used as the data source, and motor health-fault diagnosis capacity and motor state prediction effect ware evaluation indexes to compare the information fusion performance of the subspace learning algorithm before and after the use of the proposed method. According to the comparison result, the proposed method is able to eliminate highly-redundant parameters that are less correlated to feature parameters, thereby enhancing the information fusion performance of the subspace learning algorithm. Keywords: unsupervised feature selection; feature extraction; mutual information; fractal dimension; subspace learning algorithm

1. Introduction With the development of scientific and technological research, research objects in various fields such as mechanical engineering, data mining, image processing, information retrieval, and genome engineering are becoming increasingly complex. Therefore, the volume of experimentally acquired data, such as product fault data, genetic data, and high-definition image information, has also increased exponentially, as has the number of feature dimensions [1]. Multidimensional feature parameters usually exhibit sparsity. The information between any feature parameters overlaps and complements each other while there are various problems facing data description, such as poor overall identification, heavy calculation, difficulty in visualization, and incorrect conclusions. To this end, subspace learning algorithms, such as Principal Component Analysis (PCA) [2], Kernel Principal Component Analysis (KPCA) [3], Linear Discriminant Analysis (LDA) [4], Locality Preserving Projections (LPP) [5], and Locally Linear Embedding (LLE) [6], have gradually been applied to information fusion of

Entropy 2018, 20, 674; doi:10.3390/e20090674

www.mdpi.com/journal/entropy

Entropy 2018, 20, 674

2 of 16

multidimensional feature parameters. However, these methods have failed to consider the possible redundant and irrelevant feature parameters in the multidimensional feature parameter space, of which redundant features might reduce the information fusion efficiency of the subspace learning algorithms and irrelevant features might undermine the performance of subspace learning algorithms. This might eventually lead to reduced information fusion performance of the aforementioned subspace learning algorithms and affect their precision and stability [7]. With the feature selection method, the optimal feature parameter subset can be obtained and the redundant and irrelevant features in it can be eliminated with minimum information loss, thereby enhancing algorithm performance and saving running time [8]. In view of the problems above, a feature selection method was used to eliminate the redundant and irrelevant features in the feature parameter subset. Considering that the data in engineering practice and application is mostly unlabeled, feature selection should be made using the unsupervised method since it does not require data labels and select the feature subset that contain the key properties of the original feature set from a data perspective [1]. Currently, unsupervised feature selection method is made using a combined method of the search algorithm (such as genetic algorithm [9,10] and ant colony optimization [11,12]) and feature parameter subset evaluation criterion (fractal dimension [13,14] and rough set theory [15]). However, such a method might result in reduced precision of subsequent algorithms since they are troubled by heavy calculations, long running times and high time complexities O(2n ) [7]. In certain instances, there might not be an optimal solution. In information theory, entropy is a measure of uncertainty in a physical system [16,17]. Based on this definition, the information shared by two things, namely the interdependence between them, can be characterized by mutual information (MI). Thus, MI is an effective tool of measuring feature relevance and redundancy. Similar with the “Minimum Redundancy and Maximum Relevance (mRMR)” [18] of the supervised method, the basic idea of a MI-based unsupervised method also takes the redundancy and relevance of any feature parameter into overall consideration. As the quantitative index of fractal theory, fractal dimension (FD) measures the similar structures between entirety and locality [13,14,19–21], for which the similar properties between the feature parameter set and its subsets can be evaluated using FD. In this study, a mutual information and fractal dimension-based unsupervised feature selection (UFS-MIFD) method was developed based on the characteristics of MI and FD. To begin with, the linear ordering of feature parameters by importance was conducted as per their maximum “relevance” to the feature parameter set and minimum “redundancy” ordered feature set. The optimal feature parameter subset was selected from the ordered feature parameter set by using FD as the criterion of feature subset evaluation. Compared with existing feature selection algorithms, this method not only featured linear time complexity, significantly shortened running time, and greatly reduced searches, but the redundant and irrelevant features in the feature parameter set were also eliminated. Multi-rotor Unmanned Aerial Vehicles (UAVs) represent a new type of UAV with prominent features, such as simple mechanical structure, convenient use and maintenance, vertical take-off and landing, and rapid release, which make multi-rotor UAVs studied and applied in many fields such as military surveillance, power line inspection, pesticide spraying, and other fields such as express delivery in recent years. The brushless direct current (DC) motor is a power supply unit in multi-rotor UAVs, and its safety and reliability directly affects the reliability level of the multi-rotor UAV itself. Therefore, in this paper, a brushless DC motor performance degradation test was designed to acquire vibrational signals, which are used as the data source to verify the proposed method from the perspectives of fault diagnosis and state prediction. The UFS-MIFD method was investigated as shown in Figure 1. The rest if this paper is organized as follows: the process of UFS-MIFD is presented in Section 2. The brushless DC motor, the test method and process, and vibration signal acquisition and analysis are introduced in Section 3. In Section 4, the original feature parameter set is extracted based on motor vibration signals; the implementation of UFS-MIFD algorithm is also introduced. In Section 5, the validity of the proposed UFS-MIFD is verified based on the information fusion result

Entropy 2018, 20, 674

3 of 16

of the output subspace learning algorithm obtained in Section 4 from the perspectives of the motor health-fault diagnosis effect and motor state prediction. Conclusions of this study and prospects for further studies in Section 6. Entropy 2018, 20, xare FORpresented PEER REVIEW 3 of 16 Motor vibration signal data of X,Y,Z axis Feature extraction methods Multiple feature parameters of X axis

Multiple feature parameters of Y axis

Original Feature extraction Multiple feature parameters of Z axis

Feature importance ranking evaluation method based on MI Feature importance ranking results of Y axis

Feature importance ranking results of X axis

Feature importance ranking results of Z axis

Unsupervised feature selection algorithm

Evaluation of feature subsets based on fractal dimension Optimal feature subset of X axis

Optimal feature subset of Y axis

Optimal feature subset of Z axis

Subspace learning method 2 comprehensive features of X axis

2 comprehensive features of Z axis

2 comprehensive features of Y axis

Feature information fusion based on subspace learning

Subspace learning method 2 comprehensive features of motor Results verification Health-Fault diagnosis of motor

State prediction of motor

Figure 1. Paper flowchart. Figure 1. Paper flowchart.

2. Mutual Information and Fractal Dimension-Based Unsupervised Feature Parameters Selection 2. Mutual Information and Fractal Dimension-Based Unsupervised Feature Parameters Method Method Selection 2.1.Theoretical TheoreticalBasis Basis 2.1. 2.1.1.Mutual MutualInformation Information(MI) (MI) 2.1.1. Mutualinformation informationisisdefined definedbased basedon oninformation informationentropy. entropy.ItItmeasures measuresthe theinterdependence interdependence Mutual betweentwo twofeatures, features,which whichmeans meansititrepresents representsthe theinformation informationshared sharedby byboth bothfeatures. features.Suppose Suppose between that · , ,f nf n. .According thatthere thereisisaafeature featureparameter parameterset setFFcomprising comprisingnnfeature featureparameters parametersf 1 ,f1f,2 ,f·2 ·, Accordingto to information entropy theory, the mutual information between feature parameters f i and f j can be information entropy theory, the mutual information between feature parameters f i and f j can be defined as: defined as: I fi f j = H( fi ) − H fi f j = I f j | fi (1)

(

(

)

) (

)

I fi f jof feature H= fj I f j (2)) fi [16,17]; P( f i ) is the probability = (1) ( fi ) -fiH(seefi Equation where H ( f i ) is the information entropy of feature variable f i taking different probable values, which measures the uncertainty of the value of j ( is fi )theisconditional fi ) is the fi (seemeans fwhere entropy (see Equation (3)), which the uncertainty the information entropy of feature Equation (2)) [16,17];of Pf i( when the i ; H f i fH value of another feature f j is known: probability of feature variable fi taking different probable values, which measures the uncertainty

(

of the value of fi ; H fi f j

)

= −∑ P( f ientropy ( f i )conditional ) log P( f i(see ) is H the Equation (3)), which means (2) the fi

uncertainty of fi when the value of another feature f j is known:

H ( fi ) = −∑ P ( fi ) log P ( fi ) fi

(2)

Entropy 2018, 20, 674

4 of 16

H f i f j = −∑ P f j ∑ P f i f j log P f i f j fj

(3)

fi

In fact, however, the relevance between the feature parameters in the feature parameter set and their redundant features cannot be measured directly by MI, for which the mRMR criterion in the supervised method is required to measure the relevance and redundancy of features. 2.1.2. Fractal Dimension Fractals are ubiquitous in Nature. Due to the limited data points in the data set, the dataset shows fractal features only within a certain scale range, namely when the local distribution and global distribution of the dataset share similar structure or properties. In this case, it can be analyzed using fractal theory [13,14,19–21]. FD is the quantitative index of fractal theory. There are a variety of methods that can be used to calculate the FD of the dataset, of which the box-counting method is easy to implement and widely used. Therefore, FD was also calculated using box-counting method in this paper. With this method, the dataset is covered using a hypercube with a scale of ε, thereby obtaining the FD of the dataset. In non-scaling interval [ε 1 , ε 2 ], the FD of feature parameter set X with N dimensions can be calculated using the following Equation (4): D ( X ) = lim

ln N (ε)

(4)

ε→0 ln(1/ε )

where ε is the side length of the hypercube; N(ε) is the minimum number of hypercubes with a side length of ε that cover X. The points are plotted in the double logarithm coordinates based on the equation above. The least squares method is used to fit non-scaling interval [ε 1 , ε 2 ], thus obtaining the FD of the dataset. 2.2. UFS-MIFD Method The fundamental theories mentioned in Section 2.1 were extended in this paper. A UFS-MIFD algorithm was developed by drawing from mRMR of the supervised method. To begin with, the relevancy, conditional relevancy and redundancy between feature parameters [7] were defined and calculated. With overall consideration, the mRMR criterion for feature parameter importance ordering was obtained, based on which the importance ordering of feature parameters contained in the feature parameter set was conducted. The less important a feature parameter was, the lower the relevancy between the parameter and the overall feature parameter set and the higher the redundancy was. Next, the feature subsets of the ordered parameter set were selected as per the FD-based feature subset evaluation criterion, thereby eliminating the feature parameters with lower relevancy and high redundancy from the feature parameter set. The algorithmic process is as follows: First, the importance ordering of various feature parameters in the n-dimensional original feature parameter set F = [ f 1 , f 2 , · · · , f n ] was conducted stepwise. The ordered feature set was supposed as G and left empty. Step 1: The average MI between the whole feature parameter set F and every feature f i (i = 1, 2, · · · , n) was calculated using Equation (5): score( f i ) =

1 n I fi ; f j ∑ n j =1

(5)

Thus, the first important feature in G could be g1 = f l1 , where l1 = arg max {score( f i )}. 1≤i≤n

This feature was able to minimize the uncertainty of the rest of features in F. Step 2: To obtain the second important feature in G,F = [ f 1 , f 2 , · · · , f n ] was replaced by F = [ f 1 , f 2 , · · · , f j , · · · , f n−1 ]. In this case, feature f j , where j = 1, 2, · · · , n − 1, was selected randomly from F to calculate its relevancy Rel ( f j ) with F, the conditional relevancy Rel ( g1 | f i ) between g1 in G and f j ,

Entropy 2018, 20, 674

5 of 16

and the redundancy Red f j ; g1 of f j with respect to g1 , of which Rel ( f j ) was defined as the average MI between f j and F [7]: Rel f j

n 1 n −1 1 = ∑ I f j; fk = H fj + I f j; fk ∑ n k =1 n 1 ≤ k ≤ n, j 6= k

n

∑

where H f j signifies the information f j contains; by f j and other parameters in F. The larger

! (6)

I f j ; f k means the information shared

1 ≤ k ≤ n, j 6= k n I f j; fk ∑ 1 ≤ k ≤ n, j 6= k

was, the less the new information

the other parameters could provide. Therefore, if the feature parameter with the largest Rel ( f j ) was selected, there would be the least information loss in the corresponding parameter set. The conditional relevancy Rel ( g1 | f i ) between f j and g1 could be defined as [7]: H g1 f j Rel ( g1 ) Rel g1 f j = H ( g1 )

(7)

The redundancy Red f j ; g1 of f j with respect to g1 could be defined as follows [7]: Red f j ; g1 = Rel ( g1 ) − Rel g1 f j

(8)

Thus, the importance evaluation criterion E for feature parameter f j could be obtained by taking the relevance between f j and F and the redundancy of f j with respect to G into overall consideration: E f j = Rel f j − max Red f j ; g1 g1 ∈ G

Suppose that l2 = arg

max

1 ≤ j ≤ n −1

(9)

E f j f j ∈ F , the second feature in G was g2 = f l2 .

Step 3: Similarly, the original F was replaced by F = [ f 1 , f 2 , · · · , f j , · · · , f n− p+1 ] to obtain the p-th important feature in G. In this case, feature f j , where j = 1, 2, · · · , n − p + 1, was selected randomly from F. The relevance Rel ( f j ) between f j and F, the conditional relevance Rel gm f j between gm in G and f j , and the redundancy Red f j ; gm of f j with respect to gm , where, were calculated using Equations (6)–(8). Thus, the importance evaluation criterion E for feature parameter f j could be obtained by taking the relevance between f j and F and the redundancy of f j with respect to G into overall consideration: E f j = Rel f j − max Red f j ; gm (10) gm ∈ G

Suppose that lr = arg

max

1 ≤ j ≤ n − p +1

E f j f j ∈ F , p-th feature in G was g p = f lr .

Step 3 was repeated until all the feature parameters in the original feature parameter set F were ordered by their importance, that is, the ordered feature parameter set G was obtained. Step 4: On that basis, the subsets of the ordered feature parameter set G were selected using the FD-based feature parameter subset evaluation criterion proposed in this study. The main idea was that the feature parameter subsets wherein the difference between the partial fractal dimension and overall fractal dimension satisfied a certain threshold were reserved by eliminating the feature parameter that had the least influence on the feature parameter set once at a time. The steps are given as follows: (1) (2)

The FD of N-dimensional ordered feature parameter set G was calculated, denoted as frac(G). With the N-th feature parameter g N eliminated from G, there were N − 1 feature parameters, which constituted a new feature parameter subset SN− 1 . To distinguish between SN− 1 and frac(G), the fractal dimension frac(SN− 1 ) of SN− 1 was named the local fractal dimension. According to calculation, r = frac(G) − frac(SN− 1 ). If |r | ≤ η (η was the threshold parameter), SN− 1 was

as follows: (1) The FD of N-dimensional ordered feature parameter set G was calculated, denoted as frac(G). (2) With the N-th feature parameter g N eliminated from G, there were N − 1 feature parameters, which constituted a new feature parameter subset SN−1. To distinguish between SN−1 and frac(G), Entropy 2018, 20, 674 6 of 16 the fractal dimension frac(SN−1) of SN−1 was named the local fractal dimension. According to calculation, r = frac(G) − frac(SN−1). If r ≤ η ( η was the threshold parameter), SN−1 was considered similar feature parameter hadhad been eliminated, it would not considered similarwith withG. G.Although Althoughthe theN-th N-th feature parameter been eliminated, it would make a difference to G, which suggested that the N-th feature parameter was a highly redundant not make a difference to G, which suggested that the N-th feature parameter was a highly parameter that was less correlated G. redundant parameter that was lessto correlated to G. (3) Let frac(G) = frac(S ), G = G − {g }, and N = 1. calculation The calculation in step (2) continued was continued N− (3) Let frac(G) = frac(SN−1 ),1G = G−{ g N },Nand N = N −N1.− The in step (2) was until until r > η. At this point, the feature parameter subset was the optimal feature parameter subset. | | r > η . At this point, the feature parameter subset was the optimal feature parameter subset.

The flow diagram of the proposed method is shown in Figure 2. The flow diagram of the proposed method is shown in Figure 2. Feature set F

Remaining N-1 Features, SN-1

Ordered feature set G First ordered feature of G Average MI: between each feature of F and the entire set

G=

ith ordered feature of G

Relevancy: between each unordered of F and the entire set Conditional Relevancy: between ordered feature of G and unordered feature of F Redundancy: between unordered feature of F and ordered feature of G Feature importance ranking evaluation criteria E:

Calculate the fractal dimension of G，frac(G)

Calculate the partial fractal dimension of SN-1，frac(SN-1) r = frac(G) - frac(SN-1) NO YES Remove N-th feature frac(G) = frac(SN-1)，G = G-{ gN }， N = N-1

Ordered feature set G

Optimal feature subset

Figure 2. Process of the mutual information and fractal dimension-based unsupervised feature Figure 2. Process of the mutual information and fractal dimension-based unsupervised feature selection algorithm. selection algorithm.

3. Motor Vibration Data Acquisition and Signal Analysis 3. Motor Vibration Data Acquisition and Signal Analysis This section may be divided by subheadings. It should provide a concise and precise description This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation as well as the experimental conclusions that can be of the experimental results, their interpretation as well as the experimental conclusions that can drawn. be drawn. In this paper, the power motor (the U8 disc type brushless DC motor from T-MOTOR) of an In this paper, the power motor (the U8 disc type brushless DC motor from T-MOTOR) of an unmanned multi-rotor gyroplane was taken as the research object, based on which a test was unmanned multi-rotor gyroplane was taken as the research object, based on which a test was designed designed to monitor the vibrational signals during motor operation. Vibrational signals were used as to monitor the vibrational signals during motor operation. Vibrational signals were used as the sample the sample data for verifying the proposed method and motor performance degradation. The test data for verifying the proposed method and motor performance degradation. The test system is shown system is shown in Figure 3. The working process was: the single chip microcomputer that was in Figure 3. The working process was: the single chip microcomputer that was controlled by the control controlled by the control module of the computer sent pulse-width modulation (PWM) signals to the module of the computer sent pulse-width modulation (PWM) signals to the digital speed regulator digital speed regulator that controlled motor operation. Motor vibration signals along X, Y and Zthat controlled motor operation. Motor vibration signals along X, Y and Z-axes were acquired using axes were acquired using the acceleration sensor, which were then stored in the storage module of the acceleration sensor, which were then stored in the storage module of the computer. The modules the computer. The modules of the test system were powered using the system power unit. of the test system were powered using the system power unit.

Entropy 2018, 20, 674

7 of 16

Entropy 2018, 20, x FOR PEER REVIEW

7 of 16

Motor

System power unit

Control Module

Sensors

Electron speed regulator

Singlechip

Storage Module

Compute r

Figure 3. 3. Motor Motor degradation degradation test test system. system. Figure

This This motor motor performance performance degradation degradation test test was was carried carried out out at at aa 22.2 22.2 V V rated rated operating operating voltage voltage and and 100% throttle. The test conditions are shown in Table 1. 100% throttle. The test conditions are shown in Table 1. Table Table 1. 1. Conditions Conditions of of the the motor motor performance performance degradation degradation test. test. Motor KV Rotation Sampling Motor KV Voltage Current Throttle Rotation Sampling Model Value Direction Voltage Current Throttle Speed Model Value Speed Direction U8 170 22.2 V 27 A 100% 2300 rpm X, Y, Z axes U8 170 22.2 V 27 A 100% 2300 rpm X, Y, Z axes KV value represents the increased speed per volt. KV value represents the increased speed per volt.

Sampling Blade Sampling Frequency Blade Frequency 12.8 kHz 28 × 9.2 12.8 kHz 28 × 9.2

This motor performance degradation test lasted 1062 h, during which the 1416 sample signals motor performance lasted 1062 h, during which 1416 sample (eachThis signal lasted 0.5 s) were degradation captured andtest recorded at a time interval of 45the min from X, Y signals and Z(eachAs signal lasted 0.5 s) 4, were captured and under recorded a basically time interval ofduring 45 min0–1016 from X, Y and axes. shown in Figure the motor sample testat ran stably h, but an Z-axes. As shown in Figure 4, the motor sample under test ran basically stably during 0–1016 h, but an abrupt change of its operating state was observed during 1017–1062 h. Such abnormality continued abrupt change ofof itsweakening operating or state was observed 1017–1062 Such abnormality continued without any sign disappearing. As during shown in Figure 5,h. electron microscopy suggested without any sign of on weakening or disappearing. As outer shownbearing in Figure 5, electron microscopy noticeable abrasion the surfaces of the inner and races and bearing balls of suggested the motor noticeable abrasion on the surfaces of the inner and outer bearing races and bearing balls of the motor motor sample under test, which indicated that the motor sample under test had failed. Therefore, the sample under which indicated that underinput test had failed. Therefore, the motor vibration data test, acquired during0–1016 h the wasmotor taken sample as the initial data. vibration data acquired during 0–1016 h was taken as the initial input data. 4900

1016~1062h 4800

Ampl i t ude

4700 4600 4500 4400 4300 4200 4100 0

200

400

600

800

Ti me/ h

Figure 4. Operating states of the testing motor. Figure 4. Operating states of the testing motor.

1000

1200

Entropy 2018, 20, 674

8 of 16


8 of 16

(a)

(b)

(c)

(d)

Figure 5. Scanning Electron Microscopy (SEM) images of the motor bearing. (a) Outer surface of inner Figure 5. Scanning Electron Microscopy (SEM) images of the motor bearing. (a) Outer surface of inner bearing race, (b) Inner surface of outer bearing race, (c) Bearing ball 1 (d) Bearing ball 2. bearing race, (b) Inner surface of outer bearing race, (c) Bearing ball 1, (d) Bearing ball 2.

4. Motor Vibration Feature Extraction and Selection 4. Motor Vibration Feature Extraction and Selection The features of vibrational data acquired during motor operation were extracted from the The features of vibrational data acquired motorInoperation were from the perspectives of degradation description and lifeduring evaluation. this study, theextracted feature parameter perspectives of degradation description and life evaluation. In this study, the feature parameter extraction methods included time domain feature parameter extraction method [22], frequency extractionfeature methods included extraction time domain feature[23], parameter extraction [22], frequency domain parameter method wavelet packet method band energy (WPBE) domain feature feature parameter extraction method [23], wavelet packet band energy (WPBE) feature parameter parameter extraction method [24], and entropy measure-based feature parameter extraction method extraction method [24], measure-based feature extraction methodpeak, [25]. [25]. The commonly used and time entropy domain feature parameters were parameter mean value, variance (VAR), The commonly used(RMS), time domain feature parameters mean waveform, value, variance peak, root root mean square skewness, kurtosis, pulse,were margin, and (VAR), peak value; the mean square (RMS), skewness, kurtosis, pulse, margin, waveform, and peak value; the commonly-used commonly-used frequency domain feature parameters included gravity frequency (GF), meanfrequency domain (MSF), feature and parameters included gravity (GF), mean-square frequency (MSF), square frequency frequency variance (FV).frequency Entropy-based feature parameters included and frequency variance (FV). Entropy-based feature parameters included amplitude spectrum entropy amplitude spectrum entropy (ASE) and Hilbert marginal spectrum entropy (HMSE). (ASE)With and Hilbert marginal spectrum entropy (HMSE). the aforementioned feature parameter extraction method, the feature parameters of With data the aforementioned feature parameter extractionthus method, the feature parameters of vibration vibration along X, Y, and Z-axes were extracted, obtaining the triaxial 24-dimensional data along X, Y, and Z-axes were extracted, thus obtaining the triaxial 24-dimensional feature parameters. The triaxial operating state features of the motor under test are shown in feature Figure parameters. The triaxial operating state features of the motor under test are shown in Figure 6 (taking RMS, MSF, and Hereditary hemorrhagic telangiectasia (HHT) energy spectrum entropy as6 (taking RMS, MSF, andbe Hereditary telangiectasia (HHT) energy entropy an an example). It could seen that hemorrhagic the feature parameters along X, Y, and Zspectrum axes differ from as each example). It could be seen that the feature parameters along X, Y, and Z axes differ from each other. other.

Entropy 2018, 20, 674

9 of 16

9 of 16

0.5 0

0

200

400

600

800

2.4

x 10

7

Ampl i t ude

1

Ampl i t ude

X

Ampl i t ude


2.2 2

1000

0

200

400

1

0

200

400

600

800

2.4

x 10

2

400

600

800

Ti me/ h ( a) Root mean squar e

1000

10

0

200

x 10

400

600

800

0

600

800

1000

800

1000

800

1000

10.5

1000

10

0

200

400

600

Ti me/ h

6

5 0

400

11

Ampl i t ude

Ampl i t ude

Ampl i t ude 200

200

Ti me/ h

0.5

0

0

Ti me/ h

2.2

1000

1

0

10

7

Ti me/ h

Z

1000

Ampl i t ude

2

0

800

11

Ti me/ h

Ampl i t ude

Y

Ampl i t ude

Ti me/ h

600

12

200

400

600

800

Ti me/ h ( b) Mean- squar e f r equency

1000

11 10 9

0

200

400

600

Ti me/ h ( c) HHT ener gy spect r um ent r opy

Figure 6. Root mean square (RMS), mean-square frequency (MSF), and Hereditary hemorrhagic Figure 6. Root mean square (RMS), mean-square frequency (MSF), and Hereditary hemorrhagic telangiectasia (HHT) energy spectrum entropy along each axis. telangiectasia (HHT) energy spectrum entropy along each axis.

According to to the in Section According the definition definition of of mutual mutual information information given given in Section 2.1, 2.1, the the information information shared shared by the feature parameters along X, Y, and Z-axes was measured using the mutual information index. index. by the feature parameters along X, Y, and Z-axes was measured using the mutual information The of mutual between various various feature feature parameters parameters is is shown shown in in Figure Figure 77 The distribution distribution of mutual information information between (taking X-axis as an example), where the horizontal axis means the arbitrary combination of two (taking X-axis as an example), where the horizontal axis means the arbitrary combination of two 2424-dimensional feature parameters along the X-axis. Thus, there are 576 combinations. Each dimensional feature parameters along the X-axis. Thus, there are 576 combinations. Each point point represents the mutual information information between between any two feature feature parameters parameters in in the the 24-dimensional 24-dimensional feature feature represents the mutual any two parameter set of the motor along the X-axis, with its numerical values shown by gradient colors. parameter set of the motor along the X-axis, with its numerical values shown by gradient colors. According to calculations, the mutual information between various feature parameters along According to calculations, the mutual information between various feature parameters along thethe XX-axis was larger than 0 and the numerical value of mutual information between any two feature axis was larger than 0 and the numerical value of mutual information between any two feature parameters differed from from each parameters differed each other, other, which which indicated indicated that that the the information information between between various various feature feature parameters along X-axis overlapped each other with certain relevance. Similarly, calculations parameters along X-axis overlapped each other with certain relevance. Similarly, calculations also also suggested that that the the mutual with different suggested mutual information, information, with different numerical numerical values, values, between between various various feature feature parameters along Y This also parameters along Y and and Z-axes Z-axes was was also also larger larger than than 0. 0. This also evidenced evidenced that that the the information information between various feature parameters along the Y and Z-axes overlapped each other, with between various feature parameters along the Y and Z-axes overlapped each other, with certain certain relevance between relevance between them. them.

1

Figure 7. Mutual information of various feature parameters along X-axis. Figure 7. Mutual information of various feature parameters along X-axis.

The UFS-MIFD algorithm proposed in Section 2.2 was used to order the original feature parameter set of the motor under test along X, Y, and Z-axes by importance. The results of the importance ordering of feature parameters along the three axes, namely GX, GY, and GZ, are shown in Figure 8a–c, respectively.

Entropy 2018, 20, 674

10 of 16

The UFS-MIFD algorithm proposed in Section 2.2 was used to order the original feature parameter set of the motor under test along X, Y, and Z-axes by importance. The results of the Entropy 2018, ordering 20, x FOR PEER REVIEW 10 ofin16 importance of feature parameters along the three axes, namely GX , GY , and GZ , are shown Figure 8a–c, respectively. canbe beseen seenthat thatthe thepeak peakwas wasthe themost mostimportant importantfeature featureparameter parameterininthe theoriginal originalfeature feature ItItcan parameter set along the X and Y-axes while MSF was the most important feature parameter the parameter set along the X and Y-axes while MSF was the most important feature parameter ininthe original feature parameter set along the Y-axis. Figure 8 also suggests significant differences between original feature parameter set along the Y-axis. Figure 8 also suggests significant differences between variousfeature featureparameters parametersin inthe thefeature featureparameter parametersets setsalong alongthe thethree threeaxes axeswhich whichreflected reflectedthe the various difference between feature parameters along various axes. difference between feature parameters along various axes. The important important orders orders feature feature parameters parameters of of the the motor motor under under test test along along the the X, X, YY and and Z-axes, Z-axes, The namelyGGX,,GGY,, and and GGZ,, were were evaluated evaluated based based on onthe thefeature featureparameter parametersubset subsetevaluation evaluationcriterion criterion namely X

Y

Z

Important degree

η =0.05 mentionedininthe the Step of Section where the threshold parameter . Eventually, the mentioned Step 4 of4 Section 2.2, 2.2, where the threshold parameter η = 0.05. Eventually, the feature featureSXsubset X of the X-axis contained first 17parameters feature parameters of GX. Similarly, the subset feature subset of theSX-axis contained the first the 17 feature of GX . Similarly, the feature SY contained first 16 parameters feature parameters offeature GY; thesubset featureSZsubset SZ contained thefeature first 13 Ssubset the first the 16 feature of GY ; the contained the first 13 Y contained feature parameters GZ, asinshown parameters of GZ , asof shown Table in 2. Table 2.

Important degree

(a) Feature Importance Order of X-axis

Important degree

(b) Feature Importance Order of Y-axis

(c) Feature Importance Order of Z-axis

Figure8.8.Importance Importanceofofvarious variousfeature featureparameters parametersalong alongX,X,Y,Y,and andZ-axes. Z-axes.(a) (a)Feature Featureimportance importance Figure order of X-axis; (b) Feature importance order of Y-axis; (c) Feature importance order of Z-axis. order of X-axis; (b) Feature importance order of Y-axis; (c) Feature importance order of Z-axis. Table 2. Feature parameter subsets along X, Y, and Z-axes.

X

Y

1 Peak 8 RMS 15 WPBE8 1 Peak

2 Skewness 9 ASE 16 HMSE 2 ASE

3 Pluse 10 WPBE1 17 WPBE6 3 MSF

4 Mean Value 11 WPBE4

5 HESE 12 WPBE2

6 Margin 13 FV

7 WPBE5 14 GF

4 HMSE

5 RMS

6 WPBE4

7 WPBE6

Entropy 2018, 20, 674

11 of 16

Table 2. Feature parameter subsets along X, Y, and Z-axes.

X

1

2

3

4

5

6

7

Peak 8 RMS 15 WPBE8

Skewness 9 ASE 16 HMSE

Pluse 10 WPBE1 17 WPBE6

Mean Value 11 WPBE4

HESE 12 WPBE2

Margin 13 FV

WPBE5 14 GF

1

2

3

4

5

6


Y

ZZ

ASE 8Peak 9 8 9 WPBE5 Skewness WPBE5 Skewness 1515 1616 Mean Value HESE Mean Value HESE 11 22 MSF WPBE2 MSF WPBE2 88 99 WPBE3 Peak Value WPBE3 Peak Value

11 of 16

MSF 10 10 Margin

HMSE 11 11 WPBE2 WPBE2

33 Skewness Skewness 10 10 RMS RMS

44

Margin

RMS 12 12 WPBE8

WPBE8

WPBE4 13 13Value Peak

Peak Value

WPBE6 14 14 VAR VAR

66

77

55

VAR

Margin

VAR 1111 Mean MeanValue Value

7

WPBE4

Margin 1212 WPBE7 WPBE7

WPBE4 13 13 FV FV

ASE

ASE

It isisgenerally generally believed major feature information can be by covered the first twoIt believed thatthat major feature information can be covered the firstbytwo-dimensional dimensional feature fused parameters fused by the learning subspace method. learning method. this study, the operation feature parameters by the subspace In this In study, the operation state state information of the motor under test was fused by the process of feature information fusion based information of the motor under test was fused by the process of feature information fusion based on subspace subspace learning third part part of of Figure Figure 99 using using subspace subspace learning learning methods, methods, such such as as on learning shown shown in in the the third KPCA [3], [3], PCA PCA [2], [2], LPP LPP [5], [5], and and LDA LDA [4]. [4]. Thus, Thus, the the two-dimensional two-dimensional integrated integrated feature feature parameters parameters KPCA of the motor operating states were obtained. The final fusion result is shown in Figure 9. It could could be be of the motor operating states were obtained. The final fusion result is shown in Figure 9. It seen that the motor operating degradation paths described by KPCA, PCA, and LPP fluctuated less seen that the motor operating degradation paths described by KPCA, PCA, and LPP fluctuated less than that by LDA, that the better in than that by LDA, which which evidenced evidenced that the KPCA, KPCA, PCA, PCA, and and LPP LPP performed performed better in describing describing the the motor operating state than LDA. motor operating state than LDA. PCA Feature2

Feature2

Kernel PCA 2 0 -2 2

0

Feature1

-2

0

500

1000

4 2 0 -2 -4 4

2

0

Feature1

Time/h

LPP Feature2

Feature2

0

x 10

-3

8

Feature1

6

0

-4 0

500

1000

Time/h

LDA

0.02

-0.02 10

-2

500

Time/h

1000

50 0 -50 100 0

Feature1 -100 0

500

1000

Time/h

Figure 9. Final fusion results of the feature parameters of motor operating states. Figure 9. Final fusion results of the feature parameters of motor operating states.

5. Results Verification Verification and 5. Results and Analysis Analysis 5.1. Health-Fault Diagnosis of Motor 5.1. Health-Fault Diagnosis of Motor As shown in Figure 10, the “health-fault” states of the motor under test were identified based on As shown in Figure 10, the “health-fault” states of the motor under test were identified based on the feature fusion result of motor operating state obtained in Section 4. Before the use of UFS-MIFD, the feature fusion result of motor operating state obtained in Section 4. Before the use of UFS-MIFD, information fusion of the original feature parameter set was made using the aforementioned four information fusion of the original feature parameter set was made using the aforementioned four subspace learning methods. The result of health-fault states obtained based on the information fusion according to the two-dimensional integrated feature parameters F1 and F2 is shown in Figure 10a. Information fusion of the optimal feature parameter subsets SX, SY, and SZ was made using the aforementioned four subspace learning methods after the use of UFS-MIFD. The result of “health-

Entropy 2018, 20, 674

12 of 16

subspace learning methods. The result of health-fault states obtained based on the information fusion according to the two-dimensional integrated feature parameters F1 and F2 is shown in Figure 10a. Information fusion of the optimal feature parameter subsets SX , SY , and SZ was made using the aforementioned four subspace learning methods after the use of UFS-MIFD. The result of “health-fault” states obtained based on the information fusion according to the two-dimensional integrated feature parameters F1 * and F2 * is shown in Figure 10b. It can be seen that an even better health-fault state diagnosis could be observed using two-dimensional integrated motor parameters. In the following sections, quantitative evaluation Entropy 2018, 20, x FOR PEER REVIEW of the diagnostic result will be made. 12 of 16 Kernel PCA

PCA

0.5

0.5 Health Failure

0

Feature2

Feature2

Health Failure 0

-0.5 -1

-0.5 -0.8

-0.4

-0.6

0.6

0.4

0.2

0

-0.2

-1.5 -1.5

-0.5

0.5

0

Feature1

LPP

LDA

0.02

2

1.5

1

20 Health Failure

Health Failure

10

Feature2

0.01

Feature2

-1

Feature1

0 -0.01

0 -10 -20

-0.02 -7.1

-7.05

-6.95

-7

-30 -40

-6.9

Feature1

x 10

-20

0

20

40

Feature1

-3

(a) Kernel PCA

PCA

0.4

Feature2

0.2

Feature2

1

Health Failure

0 -0.2 -0.4 -0.6 -0.8

-0.4

-0.6

-0.2

0.4

0.2

0

0 -0.5 -1 -1.5

0.6

Health Failure

0.5

-1

-0.5

Feature1 LPP

-0.01

6.8

2

Health Failure

0

6.6

1.5

40

Health Failure

0.01

Feature2

Feature2

0.02

-0.02 6.4

1

0.5

0

Feature1 LDA

7.2

7

Feature1

0

-20 -60

7.4 x 10

20

-40

-20

0

20

40

60

Feature1

-3

(b) Figure 10. Comparison of the health-fault state diagnosis results based on the integrated feature Figure 10. Comparison of the health-fault state diagnosis results based on the integrated feature parameters before and after the use of mutual information and fractal dimension-based unsupervised parameters before and after the use of mutual information and fractal dimension-based unsupervised feature selection (UFS-MIFD). (a) Before the use of UFS-MIFD. (b) After the use of UFS-MIFD. feature selection (UFS-MIFD). (a) Before the use of UFS-MIFD. (b) After the use of UFS-MIFD.

Quantitative evaluation of the health-fault state diagnosis shown in Figure 10 was carried out Quantitative evaluation of the health-fault state diagnosis shown in Figure 10 was carried out using cluster evaluation index D. The form of evaluation index D is shown as follows [26]: using cluster evaluation index D. The form of evaluation index D is shown as follows [26]: tr(Sw ) + tr(Sw ) (11) D =tr (Sw 11) + tr (Sw2 2 ) D= (11) tr(Sb ) tr (Sb ) andS Sw 2represent represent the within-classscatter scattermatrices matrices(covariance (covariancematrices) matrices)of of health health and and where SSw 1 and where the within-class w1 w2 fault state state samples, samples, which which can can be be used used to to characterize characterize the the distribution distribution of of various various state state sample sample points points fault tr((SSw1 )) and around the the mean and trtr arethe thetraces tracesofofthe thewithin-class within-classscatter scattermatrices matrices of of the the around mean value; value; tr (S(Sww2 2))are 1 two state samples, and a smaller value means more concentrated internal distribution of various state distribution samples and better aggregation; S is the between-class scatter matrix of health faultand state samples, b S samples and better aggregation; is the between-class scatter matrix of and health fault state b

samples, which characterizes the distribution of various state samples in the space. The expression of Sb is given as follows:

Sb=

c



∑ P(i)( M

i

   − M0 )( Mi − M0 )T

(12)

Entropy 2018, 20, 674

13 of 16

which characterizes the distribution of various state samples in the space. The expression of Sb is given as follows: c

Sb =

→

→

→

→

∑ P(i)( Mi − M0 )( Mi − M0 )

T

(12)

i =1

→

where P(i ) is the prior probability of i-th class state samples; Mi is the mean vector of the i-th class →

→

c

→

state samples; M0 is the overall mean vector of state samples of class c, and M0 = ∑ P(i ) Mi ; tr (Sb ) i =1

is the trace of the between-class scatter matrix of the two classes of state samples. A larger tr (Sb ) suggests more scattered distribution of various state samples, which better helped to distinguish motor states. Therefore, the health-fault state diagnosis evaluation index D could be expressed as the ratio between the sum of the traces of within-class scatter matrices of the two classes of state samples and the sum of the traces of between-class scatter matrices of the two classes of state samples. A smaller D suggested better efficacy of the subspace learning algorithm in distinguishing the health-fault states. The evaluation result of the health-fault state diagnosis effect shown in Figure 10 is given in Table 3. Table 3. Evaluation of “health-fault” state diagnosis results based on integrated feature parameters before and after the use of UFS-MIFD. Subspace Learning Method

KPCA

PCA

LPP

LDA

Before the use of UFS-MIFD After the use of UFS-MIFD Percentage

0.5488 0.5373 2.1%

2.4611 2.2265 9.53%

0.8833 0.2278 74.21%

2.7966 2.4750 11.50%

It could be seen from Table 3 that the information fusion performance of the four subspace learning methods—KPCA, PCA, LPP, and LDA—was found improved after using UFS-MIFD for feature selection, which enabled them to distinguish the motor health-fault states more correctly and clearly. In addition, the degree of performance enhancement is related to the selection of the subspace learning algorithm. 5.2. State Prediction of Motor Motor state prediction was conducted using the Elman neuron network prediction method based on the discussion above. As shown in Figure 11, Elman is a typical dynamic recurrent neuron network. Unlike common neuron network structures, Elman additionally contains an association layer that is designed to memorize the output value of the hidden layer at the previous moment. It is equivalent to an operator with one-step delay, which provides the whole network with the dynamic memory function. The mathematical model of Elman neuron network is as follows: h i x (k) = f ωijx x c (k) + ωiju u(k − 1) (13) x c (k) = αx c (k − 1) + x (k − 1) h i y y(k) = g ωij x (k) ,

(14) (15)

where u(k − 1) is the input of the input layer node; x(k) is the output of the hidden layer node; y(k) is the y output of the output layer node; xc (k) is the feedback state vector; ωijx , ωij , and ωiju are the connection weight matrices from the input layer to hidden layer, from associative layer to hidden layer, and from hidden layer to output layer, respectively; g(·) is the transfer function of neurons in the output layer; f (·) is the transfer function of neurons in the hidden layer, and Sigmoid function is usually used; α is the self-feedback gain factor, where 0 < α < 1.

Motor state prediction was conducted using the Elman neuron network prediction method based on the discussion above. As shown in Figure 11, Elman is a typical dynamic recurrent neuron network. Unlike common neuron network structures, Elman additionally contains an association layer that is designed to memorize the output value of the hidden layer at the previous moment. It is Entropy 2018, 20, 14 of 16 equivalent to 674 an operator with one-step delay, which provides the whole network with the dynamic memory function. The mathematical model of Elman neuron network is as follows:

y(k) Output Layer

…

ωijy Hidden Layer

… x(k)

ω

ω

… xc(k)

…

x ij

α

u ij

α

u(k-1) Input Layer

Associate Layer

Figure 11. 11. Elman Elman neuron neuron network network structure. structure. Figure

In this study, the two-dimensional of motor operating states(13) was x (= k ) integrated f ωijx x c ( k ) feature + ωiju u ( kinformation − 1)  predicted. The first 1234 points of feature parameters were used to train the Elman neuron network model, thus obtaining an Elman neuron network training model where 50 points were taken as the xc (= k ) α x c ( k − 1) + x ( k − 1) (14) input and one point as the output. The data collected from 1235-th to 1294-th points served as the verification data to verify model precision and make parameter adjustment. The rest of the 60 points ωijyaforementioned , ( k ) = gthe after the 1294-th point were predictedyusing model. Root mean square (15) error  x ( k ) (RMSE) was used to predict the error between the predicted results and observed values based on the following formula [27]: v u n 2 u u ∑ X pre,i − Xobs,i t , (16) RMSE = i=1 n where Xpre,i is the predicted value; Xobs,i is the observed value; n is the number of points to be predicted. Prediction results are shown in Table 4. Table 4. Comparison between the predicted and observed values of the two-dimensional integrated feature parameter states before and after the use of UFS-MIFD. Subspace Learning Method Before the Use of UFS-MIFD After the Use of UFS-MIFD

KPCA

PCA

LPP

LDA

1st Feature

2nd Feature

1st Feature

2nd Feature

1st Feature

2nd Feature

1st Feature

2nd Feature

0.3291

0.3077

0.7940

0.4351

1.1280

0.4888

16.3521

8.7041

0.3175

0.2740

0.6370

0.3205

1.0609

0.4420

12.5659

6.4507

Prediction results above suggested enhanced fusion feature prediction precisions of all four subspace learning algorithms after using UFS-MIFD for feature selection. This also indicated that UFS-MIFD contributed to the performance enhancement of subspace learning algorithms. 6. Conclusions To overcome the information fusion performance decline of subspace learning algorithms caused by the redundant and irrelevant features in the multidimensional feature parameter set, the mutual information and fractal dimension-based unsupervised feature selection algorithm is

Entropy 2018, 20, 674

15 of 16

studied. A UFS-MIFD method is proposed using various theories and methods, including original feature extraction method, mutual information, and fractal theory, in response to the long computing time, high time complexity, and the possibility of failing to identify the optimal solutions that plague previous unsupervised feature selection algorithms. With this method, a feature importance ordering algorithm that takes the relevance and redundancy of features into overall consideration is developed. The optimal feature subset is identified by eliminating the highly-redundant feature parameters with low relevance to the whole feature parameter set based on the fractal dimension-based feature subset evaluation criterion. In addition, a performance degradation test of brushless DC motor of multi-rotor UAV is designed to verify the proposed method based on the vibration signal data. To verify the proposed UFS-MIFD, the information fusion performance of subspace learning algorithms before and after the use of UFS-MIFD is compared by measuring the motor health-fault diagnosis capacity and motor state prediction effect. Comparison results suggest that UFS-MIFD can play a role in enhancing the information fusion performance of subspace learning methods. Not only is the proposed method able to reduce the negative influence of irrelevant and redundant features and excessive dimension on subsequent algorithms and decisions and enhance the precision and stability of subsequent research results, but it is also of high engineering value since it can be used for the feature selection of large volumes of unlabeled data. With limited data of the motor under test, however, there is still room for the improvement and optimization of the proposed method with the increase of test subjects and sample size. Moreover, because the application of the proposed method in this paper is specific, the proposed method can be applied to the feature selection of vibration signals of similar UAVs’ operating systems. In other words, it is not clear if the behavior of the proposed method will be the same for different types of signals of other applications. Therefore, the adaptability and universality of the proposed method will be further discussed and investigated in the following research. Author Contributions: Y.H., X.W. and L.W. proposed the idea of the research, designed the structure, and analyzed the theory; Y.H., X.W. and L.W. conceived, designed and performed the test; Y.H. analyzed the data and wrote the paper. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3.

4. 5. 6. 7. 8. 9.

Tabakhi, S.; Moradi, P.; Akhlaghian, F. An unsupervised feature selection algorithm based on ant colony optimization. Eng. Appl. Artif. Intell. 2014, 32, 112–123. [CrossRef] Widodo, A.; Yang, B.S. Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors. Expert Syst. Appl. 2007, 33, 241–250. [CrossRef] Zhou, H.T.; Chen, J.; Dong, G.M.; Wang, H.C.; Yuan, H.D. Bearing fault recognition method based on neighbourhood component analysis and coupled hidden markov model. Mech. Syst. Signal Process. 2016, 66, 568–581. [CrossRef] Jin, X.H.; Zhao, M.B.; Chow, T.W.S.; Pecht, M. Motor bearing fault diagnosis using trace ratio linear discriminant analysis. IEEE Trans. Ind. Electron. 2014, 61, 2441–2451. [CrossRef] Ding, X.X.; He, Q.B.; Luo, N.W. A fusion feature and its improvement based on locality preserving projections for rolling element bearing fault classification. J. Sound Vib. 2015, 335, 367–383. [CrossRef] Ma, M.; Chen, X.F.; Zhang, X.L.; Ding, B.Q.; Wang, S.B. Locally linear embedding on grassmann manifold for performance degradation assessment of bearings. IEEE Trans. Reliab. 2017, 66, 467–477. [CrossRef] Xu, J.L.; Zhou, Y.M.; Chen, L.; Xu, B.W. An unsupervised feature selection approach based on mutual information. J. Comput. Res. Dev. 2012, 49, 372–382. Panday, D.; de Amorim, R.C.; Lane, P. Feature weighting as a tool for unsupervised feature selection. Inf. Process. Lett. 2018, 129, 44–52. [CrossRef] Jing, S.Y. A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput. 2014, 18, 1373–1382. [CrossRef]

Entropy 2018, 20, 674

10. 11. 12. 13. 14. 15.

16. 17. 18.

19. 20. 21. 22.

23. 24. 25. 26. 27.

16 of 16

Lu, L.; Yan, J.H.; de Silva, C.W. Dominant feature selection for the fault diagnosis of rotary machines using modified genetic algorithm and empirical mode decomposition. J. Sound Vib. 2015, 344, 464–483. [CrossRef] Wan, Y.C.; Wang, M.W.; Ye, Z.W.; Lai, X.D. A feature selection method based on modified binary coded ant colony optimization algorithm. Appl. Soft Comput. 2016, 49, 248–258. [CrossRef] Tabakhi, S.; Moradi, P. Relevance-redundancy feature selection based on ant colony optimization. Pattern Recognit. 2015, 48, 2798–2811. [CrossRef] Zhang, C.; Ni, Z.W.; Ni, L.P.; Tang, N. Feature selection method based on multi-fractal dimension and harmony search algorithm and its application. Int. J. Syst. Sci. 2016, 47, 3476–3486. [CrossRef] Ni, Z.; Zhu, X.; Ni, L.; Cheng, M.; Wang, Y. An improved discrete optimization algorithm based on artificial fish swarm and its application for attribute reduction. J. Inf. Comput. Sci. 2015, 12, 2143–2154. [CrossRef] Pacheco, F.; Cerrada, M.; Sánchez, R.V.; Cabrera, D.; Li, C.; de Oliveira, J.V. Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Syst. Appl. 2017, 71, 69–86. [CrossRef] Guariglia, E. Entropy and fractal antennas. Entropy 2016, 18, 84. [CrossRef] Crutchfield, J.P.; Feldman, D.P. Regularities unseen, randomness observed: Levels of entropy convergence. Chaos Interdiscip. J. Nonlinear Sci. 2003, 13, 25–54. [CrossRef] Ramírez-Gallego, S.; Lastra, I.; Martínez-Rego, D.; Bolon-Canedo, V.; Benitez, J.M.; Herrera, F.; Alosnso-Betanzos, A. Fast-mRMR: Fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 2017, 32, 134–152. [CrossRef] Yan, G.H.; Li, Z.H. A two phases unsupervised sequential forward fractal dimensionality reduction algorithm. J. Comput. Res. Dev. 2008, 45, 1955–1964. Zanette, D.H. Generalized Kolmogorov entropy in the dynamics of multifractal generation. Phys. A Stat. Mech. Appl. 1996, 223, 87–98. [CrossRef] Guariglia, E. Spectral analysis of the Weierstrass-Mandelbrot function. In Proceedings of the 2nd International Multidisciplinary Conference on Computer and Energy Science, Split, Croatia, 12–14 July 2017. Xiao, Y.; Ding, E.; Chen, C.; Liu, X.; Li, L. A novel characteristic frequency bands extraction method for automatic bearing fault diagnosis based on hilbert huang transform. Sensors 2015, 15, 27869–27893. [CrossRef] Chen, B.Y.; Li, H.R.; Yu, H.; Wang, Y.K. A hybrid domain degradation feature extraction method for motor bearing based on distance evaluation technique. Int. J. Rotating Mach. 2017, 2017, 1–11. [CrossRef] Ocak, H.; Loparo, K.A.; Discenzo, F.M. Online tracking of bearing wear using wavelet packet decomposition and probabilistic modeling: A method for bearing prognostics. J. Sound Vib. 2007, 302, 951–961. [CrossRef] Wang, Y.S.; Ma, Q.H.; Zhu, Q.; Liu, X.T.; Zhao, L.H. An intelligent approach for engine fault diagnosis based on Hilbert–Huang transform and support vector machine. Appl. Acoust. 2014, 75, 1–9. [CrossRef] Michael, M.; Lin, W.C. Experimental study of information measure and inter-intra class distance ratios on feature selection and orderings. IEEE Trans. Syst. Man Cybern. 1973, 3, 172–181. [CrossRef] Madhiarasan, M.; Deepa, S.N. A novel method to select hidden neurons in ELMAN neural network for wind speed prediction application. Wseas Trans. Power Syst. 2018, 13, 13–30. © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Study on Mutual Information and Fractal Dimension-Based ... - MDPI

Study on Mutual Information and Fractal Dimension-Based ... - MDPI

Suggest Documents

Tsallis Mutual Information for Document Classification - MDPI

A Study on Mutual Information-based Feature ... - DORAS - DCU

A Study on Mutual Information-based Feature ... - DORAS - DCU

Fractal Feature Analysis and Information Extraction of ... - MDPI

Entropy and Fractal Antennas - MDPI

KullbackâLeibler Divergence and Mutual Information of ... - MDPI

Mutual information, Fisher information and population coding

Mutual information, Fisher information and ... - Semantic Scholar

Sequence alignment and mutual information

Mutual Visibility and Information Structure

ON FEATURE EXTRACTION BY MUTUAL INFORMATION ... - CiteSeerX

On the Mutual Information in Hawking Radiation

On Maximum Mutual Information Speaker-Adapted Training

Quantifying synergistic mutual information

MUTUAL INFORMATION CALCULATIONS

Estimating Mutual Information

Estimating Mutual Information

Study on Mutual Coupling Reduction Technique for

Optimal Micro-PMU Placement Using Mutual Information ... - MDPI

Normalized-Mutual-Information-Based Mining Method for ... - MDPI

Machine Learning with Squared-Loss Mutual Information - MDPI

Fractal Structure and Non-Extensive Statistics - MDPI

On Mutual Information in Multipartite Quantum States and Equality in

On Maximal Correlation, Mutual Information and Data Privacy

Study on Mutual Information and Fractal Dimension-Based ... - MDPI