modified distance function (MDF) and its relationship with information entropy in more details. I declared why the energies of optimal frequency bands obtained ...
Modified Distance Function and Information Entropy for Feature Extraction: A Guide Amir Hosein Zamanian Southern Methodist University, Dallas, TX, 75206. Abstract— This document is a supplementary record for “Application of energies of optimal frequency bands for fault diagnosis based on modified distance function” to discuss the modified distance function (MDF) and its relationship with information entropy in more details. I declared why the energies of optimal frequency bands obtained by maximization of the MDF is desirable for feature extraction. Furthermore, guidelines were also provided for parameter adjustment of the MDF. Keywords—Modified distance function, Information entropy, Shannon entropy, Feature extraction.
I. INTRODUCTION The modified distance function (MDF) of energies of a signal in different frequency bands, 𝑀𝑖, was first presented by Zamanian and Ohadi [1] to reduce the number of extracted features by maximization of the MDF for fault diagnosis application. The MDF is presented as, 𝐶 ∑𝐶 𝑛=1 ∑𝑝>𝑛
∑𝑘𝑚=1 𝐸̅𝑚,𝑛 − 𝐸̅𝑚,𝑝
2
𝑀𝐷𝐹 = 𝑂𝑛 1 1 + 𝛼(𝑁𝐹 − 1) + 𝛽 ∑𝐶 𝑛=1 ∑𝑝=1 𝑂 𝑛
∑𝑘𝑚=1 𝐸𝑚,𝑛,𝑝 − 𝐸̅𝑚,𝑛
2
and, 𝑂𝑛
𝐸̅𝑚,𝑛 =
1 𝐸 𝑂𝑛 𝑝=1 𝑚,𝑛,𝑝
where, 𝐸𝑚,𝑛,𝑝 denotes the energy of the m-th frequency band between 𝑀𝑚−1 and 𝑀𝑚 in the p-th observation of the n-th class, and 𝐸̅𝑚,𝑛 denotes the average energy of the m-th frequency band in the n-th class with 𝑂𝑛 observations, and C is the total number of classes. Zamanian and Ohadi used the MDF to reduce the number of features in energies of vibration signals based on Parseval’s theorem [2]. The energy of a signal is a sensible positive scalar quantity while it is possible to split this quantity to other positive quantities as a function of frequency. The number of the bands and the frequencies are optimized to get possible separable features using the modified distance function (MDF). The separable features facilitate the accuracy and classification performance of expert systems for classification of the signals. The nominator of the MDF is a regular distance function of the energies of the frequency bands (features), and optimization of MDF implies how far the feature sets can be positioned from each other, while the denominator is to consider penalty factors for the number of features and the dispersion of the data in each class.
The question that is required to be answered is why the energies of optimal frequency bands using maximization of MDF is suitable for feature extraction?. I tried to address this problem in view of information entropy, dispersion of the features, and the number of features. II. MODIFIED DISTANCE: CONCEPTUAL FRAMEWORK For simplicity and without any loss of generality, assume that there are two classes: A and B. Consider that the origin was translated to the center of class B and the dispersion of features are zero. Therefore, the distance of class A from origin implies the distance between class A and B in space. Two extremum cases can be examined where energy is distributed in all frequency bands uniformly, and when the energy is concentrated in only one band, provided that the energy is distributed uniformly over k frequency bands, it can be shown that the regular distance is 𝐷U = 𝐸𝐴𝐵 ⁄√𝑘 since 𝐸𝑖 = 𝐸𝐴𝐵 ⁄ 𝑘. The subscript AB denotes A relative to B. In addition, when the all energy is concentrated in one band the regular distance function of the features is 𝐷C = 𝐸𝐴𝐵 , no matter how many the number of features are, and 𝐸𝑖 = 0 for 𝑖 ≠ 𝑚 and 𝐸𝑖 = 𝐸𝐴𝐵 for 𝑖 = 𝑚. Any other energy distribution is between these two cases. The upper and lower limits of a regular distance function versus the number of features are shown in Fig. 1. The maximization of distance function tries to push the extracted features to lowest dimension ( 𝑁𝐹 = 𝑘 ) and to the upper limit (i.e. concentrated form) simultaneously.
Fig. 1.
Regular distance function versus the NF.
The increase in the number of features translates the distance towards lower value in uniform distribution case while in the concentrated distribution form there is a neutral relationship between the distance and the number of features. That is not desirable in view of feature reduction. However, the MDF for concentrated case and 𝛽 = 0 is equal to 𝐸𝐴𝐵 ⁄ 1 + 𝛼(𝑘 − 1) . Therefore, increase in the number of features has an influence on both the upper and lower limits as shown in Fig. 2.
Fig. 3.
Fig. 2.
The MDF versus the NF (𝛼𝑚𝑎𝑥 = 0.2).
The MDF versus the NF (𝛼 = 0.1), both upper and lower limits are affected by the NF.
III. INFORMATION ENTROPY I use the concept of the information entropy of the features which is well-known as Shannon entropy [3], 𝑘
𝑝𝑖 log(𝑝𝑖 )
𝑆=− 𝑖=1
where 𝑝𝑖 ’s are the normalized energies of the frequency bands. 𝑝𝑖 =
𝐸𝑖 𝐸𝐴𝐵
Fig. 4.
In case of the concentrated energy distribution, all 𝑝𝑖 s equal zero except m-th frequency band that is equal to 1. ∀𝑖 ≠ 𝑚 → 𝑝𝑖 = 0, ∀𝑖 = 𝑚 → 𝑝𝑘 = 1
In addition, 𝑆𝐶 = 0 (consider lim 𝑝𝑖 log(𝑝𝑖 ) = 0). 𝑝𝑖 →0
For uniformly distributed energies in frequency bands, 𝐸𝑖
∀𝑖 ∈ {1,2, . . , 𝑘}, 𝑘𝐸𝑖 = 𝐸𝐴𝐵 , 𝑝𝑖 = 𝐸 𝑆𝑈 = − ∑𝑘𝑖=1 𝑘1 log
1 𝑘
= − log
1 𝑘
𝐴𝐵
= 𝑘1.
log(𝑘) > 0.
This implies that the entropy of the normalized energies decreases if (a) concentration of energy in frequency bands increases (moving towards 𝑆𝐶 ), and (b) number of the features decreases. This two are comparable with distance function.
The excessive increase of parameter 𝛼, makes the number of the feature dominant over optimization process.
it should be avoided. Our recommendation for choosing this parameter 𝛼 is to gradually increase it from zero and not to exceed more than 𝛼𝑚𝑎𝑥 = 0.2 (See Fig. 3). The effects of dispersion factor 𝛽 is like the number of the features because distance function of the centers of the classes are neural to the dispersion of features (no matter if they are scattered or clustered), the dispersion should be considered in the MDF. Any positive value can be considered for 𝛽 our recommendation is no to exceed 𝛽𝑚𝑎𝑥 = 1⁄𝐶 (where C is the number of classes in the dataset) to avoid overweighing the dispersions in comparison to the distance of the classes.
IV. PARAMETER SELECTION As seen before in Fig. 1. the regular distance 𝐷C and its corresponding entropy 𝑆C are neutral with respect to the number of the features, however in the MDF the weight factor 𝛼 in the denominator impress the concentrated case. Technically, the MDF function is valid for every 𝛼 > 0 ; however, the excessive increase of the value of 𝛼 dominates the effect of the number of features over optimization process and
REFERENCES [1]
[2] [3]
A. H. Zamanian, and A. Ohadi, “Application of energies of optimal frequency bands for fault diagnosis based on modified distance function,” J Mech Sci Technol., vol. 31, pp. 2701-2709, July 2017. A. V. Oppenheim and R. W. Schafer, “Discrete-Time Signal Processing.” E. T. Jaynes, “Information theory and statistical mechanics,” Physical review, vol. 106, no. 4, p. 620, 1957.