The Effectiveness of Wavelet Based Features on Power Quality Disturbances Classification in Noisy Environment Marija Markovska, Dimitar Taskovski Ss. Cyril and Methodius University in Skopje Faculty of Electrical Engineering and Information Technologies Skopje, R. Macedonia marijam,
[email protected] Abstract—Power quality (PQ) disturbances classification plays an essential role in ensuring high quality power supply of the power grid. One of the main issues in classification is how to extract the “right” features from massive amount of PQ data. The feature selection should be performed for the aim of not only increasing the classification accuracy, but in the same time reducing the calculation time of the classification algorithm. Accordingly, in this work we investigate the effectiveness of the wavelet based features on the classification accuracy in order to perform optimal feature extraction method. The investigation is made using three different classifiers, in case of pure PQ signals and PQ signals accompanied with white Gaussian noise. The results show that the effectiveness of a given feature is not general, but it depends on the kind of the other features it is used with and the noise level present in the signal. Index Terms—Decision tree, feature extraction, power quality, random forest, support vector machine.
I.
INTRODUCTION
Nowadays, when the use of intelligent electrical devices and polluting loads continually increases, power quality (PQ) of the power grid is heavily disturbed. PQ disturbances produce various undesirable effects which affect the compatibility and longevity of the devices connected to the power supply network. The effects are plainly noticeable at industrial or public facilities where the voltage and current abnormalities cause malfunction of the equipment and may also lead to interruption of the working process. Therefore, the quality of the power has become a major concern for the electric utilities and consumers which are expected to acquire an ideal voltage and current waveform at rated frequency. In order to ensure high quality power supply, it is very important to analyze and classify these disturbances. Different methods for automated detection and classification of PQ events have been proposed recently. Some commonly used classifiers based on artificial intelligence (AI) are rule-based expert systems (ES) [1],
artificial neural networks (ANN) [2], fuzzy logic (FL) classification systems [3], support vector machines (SVM) [4] and data mining based classifiers such as decision trees (DT) [5] and random forest (RF) [6]. All these methods use feature vectors obtained from disturbance waveforms for classification of different PQ events. Researchers have proposed several digital signal processing techniques which can be used for extraction of features that characterize PQ disturbances [7]. Such techniques are short-time Fourier transform (STFT), fast working Fourier transform (FT), Neural Networks, Stransform (ST) and wavelet transform (WT) [8], [9]. WT analyses give frequency and time information accurately by convolving the dilated and translated wavelet with the input signal. This property makes the WT approach suitable for detecting various deviations in voltage and current waveform, caused by different PQ disturbances. Therefore, among all signal processing techniques, WT has been most extensively used through the last years. Extraction of features from a disturbed signal provides information that helps to recognize the faults responsible for the present disturbances. Accordingly, the feature extraction is the most important component for designing an intelligent system for classification, since even the best classifier will perform poorly if the features are not properly chosen. However, precise and faster feature extraction is still an open issue. Different feature vector construction methods for selection of “better” features among a given feature set have been proposed. All these methods introduce additional mathematics and with that increase the computational complexity of the classification algorithm. Hence, in this work a research using the comparison “how useful the wavelet based features are with respect to each other” for classification of 11 different types of PQ disturbances is presented. The investigation is made using all combinations of the wavelet based features, in case of pure PQ signals and PQ signals accompanied with 20bB, 30dB, 40dB and 50dB white
978-1-5386-0517-2/18/$31.00 ©2018 IEEE
Gaussian noise. The aim of the investigation is to show that high classification accuracy can be achieved using smaller number of features in the feature extraction process. II.
WAVELET TRANSFORM
Wavelet transform is a representation of a function by wavelets which are scaled and translated copies, known as “daughter wavelets”, of a finite-length oscillating waveform, known as “mother wavelet”. WT analyses are intensively used in the last years, mostly due to the fact that WT provides not only accurate frequency information, but in the same time accurate information about time localization of the components. This property makes WT suitable for analyses of transient events that may occur in power system [10]. The continuous wavelet transform (CWT) of a signal x(t) is given with the expression: ∞
CWT (a, b) =
x(t )ψ
ab (t ) dt
(1)
−∞
where the function ψ(t) is the basic wavelet function or the so called mother wavelet. The definition of mother wavelet is given as:
ψ ab (t ) =
1 a
t −a , a, b ∈ ℜ; a ≠ 0 a
ψ
(2)
where a is the scale parameter, expressed as |1/frequency| and corresponds to frequency information, while b is the translation parameter, related to the location of the wavelet function shifted through the signal, and corresponds to the time information. Since the transformation is achieved by dilating and translating the mother wavelet continuously, it generates substantial redundant information. Therefore, instead of continuous dilatation and translation, the mother wavelet can be dilated and translated discretely. That can be achieved by selecting the parameters a = a0m and b = nb0 a0m ,
III.
FEATURE EXTRACTION PROCESS
Wavelet transform is in fact a measure of similarity between the mother wavelet and the input signal. Thus, a proper selection of the mother wavelet function is the key for performing a successful DWT application. In this work Daubechies4 is used as one of the most frequently used wavelets in PQ problems [11]. The number of the decomposition levels, l, of the DWT is also of significant importance. Higher number of levels will bring more information in the system, i.e. an extended vector of features, and in that way will increase the classification accuracy. Additionally, choosing higher l will result in increased calculation complexity of the classification algorithm. Investigational results have shown that increasing the number of levels after l=6 do not significantly affect the accuracy of the algorithm [11]. Therefore, in this work the number of decomposition levels is chosen to be 6. The wavelet coefficients obtained by the multiresolution and wavelet analyses of the disturbance signal contain effective feature information. Thence, with the detail coefficients obtained at each decomposition level and the approximation coefficients from the last level, feature vector is constructed. Several techniques for feature extraction have been proposed, using the wavelet based features given in Table I. For instance, in [11]-[14] feature extraction approaches are presented using only one wavelet based feature, i.e. in [12] norm entropy is used, while in the others only energy is used. In [15] energy, standard deviation and Shannon entropy are used. In [16] all nine features are used. TABLE I. EXPRESSIONS OF THE WAVELET BASED FEATURES
Labels
Features based on DWT 2 N Ei = Cij j =1
f1
Energy
where a0 and b0 are fixed constants with values a0 > 1 and
f2
Mean
b0 > 0, m, n ∈ and is a set of integers. As a result discrete wavelet transform (DWT) is obtained, defined with the expression:
f3
Standard deviation
σi =
f4
Skewness
SKi =
f5
Shannon Entropy
f6
RMS
f7
Kurtosis
DWT [m, n] =
1 a0m
∞
−∞
k − nb0 a0m f [k ]ψ . a0m
(3)
With the choice of a0 = 2 and b0 = 1 a dyadic orthonormal WT is achieved and can be easily and quickly implemented using the filter bank techniques known as Multi-Resolution Analysis (MRA). The filter bank is used to decompose the signal into different number of levels using a low-pass filter and a high-pass filter. The basic idea of the MRA is the idea of successive approximation together with the idea of “added details”. The low-frequency part of the input signal is split again into two parts of high and low frequencies. Depending on the size of the signal and the type of application, this process can be repeated several times. As a result, logarithmic decomposition of the frequency spectra is obtained.
μi =
1 N C N j =1 ij
2 1 N Cij − μi N j =1
(
)
1 N Cij − μi 6 N j =1 σ i
3
N 2 2 SEi = − Cij log(Cij ) j =1 rmsi =
KRTi =
1 N 2 Cij N j =1
N 1 N Cij − μi 24 N j =1 σ i
4 − 3
f8
Log-energy entropy
N 2 LOEi = log(Cij ) j =1
f9
Norm entropy
N P NEi = (Cij ) ,1 ≤ P j =1
In the equations, the index i = 1, 2,..., l denotes level of decomposition, Cij corresponds to the detail coefficients obtained at each decomposition level and approximation coefficients from the last level, while N is their number. In this work, investigation of the influence of the wavelet based features on the PQ disturbance classification is performed. The investigation is made in case of pure PQ signals and PQ signals accompanied with white Gaussian noise. For the aim of the investigation three different classifiers are used, which are SVM, DT and RF. IV.
CLASSIFICATION RESULTS
The PQ signals are generated according to the mathematical definitions of nine different PQ disturbances given in [17], using Matlab 8.5. Each signal is sampled with sampling frequency of 3.2 KHz (64 samples/cycle) and from every signal 10 cycles are included, such that each signal contains 640 samples. The fundamental frequency is 50 Hz. The effectiveness of the wavelet based features is investigated in case of 11 PQ classes. The classes are normal, swell, sag, harmonic, outage, sag with harmonic, swell with harmonic, flicker, oscillatory transient, notch and spike, and 1000 disturbances from each class are included. In order to extend the investigation and to achieve some level of non-stationarity, the pure PQ signals are synthesized in noisy
environments by adding evenly distributed white Gaussian noise with signal-to-noise ratio (SNR) values of 20dB, 30dB, 40dB and 50dB. The aim of the performed investigation is choosing an optimal feature combination used in the feature extraction process for each classifier. Some of the highest accuracies obtained for all combinations of the wavelet based features are presented in the following subsections. A. Support Vector Machine SVM is a statistical learning technique based on structural risk minimization method. Its main objective is building a model with the use of training set, where each sample belongs into one of the possible classes, class 1 or class 2. With the trained SVM, a prediction model can be achieved, which will separate new samples of the given classes using a hyperplane. When the two classes cannot be linearly separated, a soft margin SVM technique is performed. This technique can choose a hyperplane which can still best separate the data, by applying a kernel function that projects the feature vector into a vector space where all of the features are linearly separable. In this paper, the SVM classifier is performed using libSVM implementation for Matlab [18]. The classification model is built using linear type of the kernel function. From the obtained classification results, some of the most accurate are given in Table II, and graphically presented in Fig. 1.
TABLE II. ON THE CHOICE OF FEATURE COMBINATION USING SVM CLASSIFIER
1
20dB noise Features Acc. (f) (%) 1 77.18
30dB noise Features Acc. (f) (%) 1 84.36
40dB noise Features Acc. (f) (%) 1 86.09
50dB noise Features Acc. (f) (%) 5 85.76
Pure signal Features Acc. (f) (%) 5 94.85
2 3
1,5 1,5,9
84.15 85.12
1,5 1,5,9
89.21 90.35
1,5 1,5,9
91.85 92.65
5,7 1,5,7
92.49 94.04
5,8 5,7,8
97.24 97.99
4 5
1,4,5,9 1,4,5,7,9
85.12 85.39
1,4,5,9 1,4,5,7,8
90.34 90.48
1,5,7,8 1,4,5,7,8
92.88 92.79
1,5,7,9 1,5,6,7,9
94.91 94.64
1,5,7,8 1,4,5,7,8
98.13 98.09
6 7
1,2,5,6,7,9 1,3,4,5,6,7,9
85.55 85.51
1,3,4,5,6,9 1,3,4,5,6,7,9
90.47 90.56
1,2,5,6,7,9 1,2,3,5,7,8,9
93.00 93.58
1,4,5,6,7,9 1,2,3,4,5,6,7
94.75 94.56
1,3,4,5,7,8 1,2,3,5,7,8,9
98.12 98.12
8 9
1,2,3,4,5,6,7,9 All nine
85.54 83.30
1,2,3,4,5,6,7,9 All nine
90.59 90.63
1,2,3,4,5,6,7,9 All nine
93.32 93.19
1,2,3,4,5,6,7,9 All nine
94.91 93.98
1,3,4,5,6,7,8,9 All nine
98.04 98.11
N°
Figure 1. Effectiveness of the wavelet based features in case of SVM classifier
From the table and the figure it is evident that after using a combination of three wavelet based features up to nine features the classification accuracies are almost identical, weather the PQ signal is pure or accompanied with noise. Accordingly, the combination of three features, i.e. “f1, f5, f7”, “f1, f5, f9” or “f5, f7, f8”, gives high accuracy with less number of features and represents an optimal feature combination. Using these features additional investigation is made. Namely, in Fig. 2 there is graphically represented how only these feature combinations affect the classification accuracy in case of different noise levels.
Figure 2. Effectiveness of the optimal feature combination using SVM classification method
Due to the fact that in measurements the level of the noise is not fixed, there is a need of feature combination that gives the highest or nearly highest accuracy for each noise level. In accordance with that and with the results from Figure 2, the feature combination energy, Shannon entropy and kurtosis (“f1, f5, f7”) and combination energy, Shannon entropy and norm-entropy (“f1, f5, f9”) give almost equal accuracies. Hence, these combinations are optimal feature combinations for 11 PQ classes when SVM classifier is used. B. Decision Tree The same procedure as before is performed here, using DT classification method. DT approach is widely used in classification based on the choice of features that maximize and fix data division [5]. These features are split into several branches recursively, until termination and classification are reached. Once the DT classifier is trained, testing new data sets is a simple process. For a given vector of observations, different DTs with different accuracy levels can be built. Thus, obtaining an optimal DT through the large space dimensions is challenging. However, there are suitable algorithms which can be applied on DT and that reflect a tradeoff between accuracy and complexity [19]. In this paper, the split criterion of the DT classification method is chosen to be “deviance” for all of the trees. From the big amount of obtained accuracy results, some of the highest are given in Table III and shown in Fig. 3.
TABLE III. ON THE CHOICE OF FEATURE COMBINATION USING DT CLASSIFIER
1
20dB noise Features Acc. (f) (%) 5 73.35
30dB noise Features Acc. (f) (%) 5 82.64
40dB noise Features Acc. (f) (%) 5 88.96
50dB noise Features Acc. (f) (%) 5 91.55
Pure signal Features Acc. (f) (%) 5 95.82
2 3
5,7 5,7,9
77.15 77.75
5,7 5,7,9
86.54 86.51
3,5 3,4,5
92.69 92.76
5,8 5,7,8
94.58 94.45
1,8 5,7,8
97.40 97.25
4 5
3,5,6,7 3,4,5,7,9
77.69 77.83
1,3,7,9 1,3,5,6,7
86.98 87.07
3,5,8,9 3,5,6,8,9
93.13 93.15
2,5,7,8 4,5,7,8,9
94.85 94.54
5,6,8,9 1,5,6,8,9
97.70 97.66
6 7
2,3,4,5,6,7 1,2,3,4,5,7,9
77.86 77.86
1,3,5,6,7,9 1,2,3,5,6,7,8
87.15 86.87
1,2,4,6,8,9 2,3,4,5,6,8,9
92.99 92.99
2,4,5,6,7,8 2,4,5,6,7,8,9
94.21 93.95
1,2,4,6,8,9 1,2,3,4,6,8,9
97.54 97.49
8 9
1,2,3,4,5,6,7,9 All nine
77.83 76.98
1,2,3,4,5,6,7,9 All nine
86.86 86.45
1,2,3,4,5,6,7,9 All nine
92.75 92.28
1,2,3,4,5,6,7,8 All nine
93.45 93.40
1,2,3,4,5,6,8,9 All nine
97.51 97.15
N°
Figure 3. Effectiveness of the wavelet based features in case of DT classifier
Correspondingly to the given results, the most frequent and least numerous feature combination that gives high accuracies is the combination of two features, i.e. “f5, f7” or the combination of three features, i.e. “f5, f7, f8” or “f5, f7, f9”. The effectiveness of only these features can be noticed from Fig. 4.
Figure 4. Effectiveness of the optimal feature combination using DT classification method
From the given graph it is obvious that the accuracies obtained using two features and three features are almost
equal. Hence, the combination of Shannon entropy and kurtosis is optimal combination for 11 PQ classes when DT classifier is used. C. Random Forest RF approach is a representation of tree predictors, such that each tree depends on the values of a random feature vector sampled independently and with same distribution for all tress in the forest. Classification accuracy of the RF is measured by analyzing the generalization error. This error is an important index for measuring the extrapolation ability of the classifier [6]. The generalization error is based on a margin function which measures the extent to which the average number of votes for the correct class exceeds the average number of votes for any other class. The rule says: “The larger the margin, the better the performance of the classifier”. One of the main properties of the RF approach is that it does not overfit, as the number of the trees increases, but produce a constant value of the generalization error. In this investigation the RF classifier is constructed using one hundred trees in the forest. The obtained classification results are presented in Table IV and shown in Fig. 5. In accordance with the presented results, the combinations of two features, i.e. “f5, f8”, and three features, i.e. “f3, f5, f7” or “f5, f7, f8” are used for additional investigation,
TABLE IV. ON THE CHOICE OF FEATURE COMBINATION USING RF CLASSIFIER 20dB noise N°
Features (f)
30dB noise Features (f)
40dB noise
1 5,7
Acc. (%) 86.96 90.06
Features (f)
50dB noise
5 5,8
Acc. (%) 93.46 95.69
Features (f)
Pure signal
1 2
1 5,7
Acc. (%) 75.78 80.58
5 5,8
Acc. (%) 95.15 97.36
5 5,8
Acc. (%) 97.06 98.30
3 4
3,5,7 3,5,7,9
81.55 81.79
3,5,7 3,5,7,9
90.80 90.85
3,5,7 2,5,7,8
96.17 96.33
5,7,8 2,5,7,8
97.43 97.32
5,7,8 2,5,6,8
98.38 98.49
5 6
1,5,7,8,9 1,2,3,5,7,8
81.76 81.99
3,5,6,7,8 1,2,3,4,5,7
91.13 91.15
1,2,5,7,8 2,3,4,5,7,8
96.53 96.54
3,5,6,7,8 2,3,4,5,7,8
97.21 97.24
1,5,6,7,8 2,3,6,7,8,9
98.50 98.48
7 8
1,2,3,5,6,7,8 1,3,4,5,6,7,8,9
81.93 81.94
1,2,3,5,6,7,8 1,2,3,4,5,7,8,9
91.20 91.17
2,3,4,5,6,7,8 1,2,3,4,5,6,7,8
96.43 96.35
2,3,4,5,6,7,8 1,2,4,5,6,7,8,9
97.25 97.17
1,3,5,6,7,8,9 1,2,4,5,6,7,8,9
98.50 98.43
9
All nine
81.39
All nine
91.03
All nine
96.20
All nine
97.15
All nine
98.38
Figure 5. Effectiveness of the wavelet based features in case of RF classifier
Features (f)
graphically presented in Fig. 6. From the graphs it is evident that the obtained accuracy curves are almost overlapping. Hence, the combination consisted of Shannon entropy and log energy-entropy is the general optimal combination for 11 PQ classes when RF classifier is used.
REFERENCES [1]
[2]
[3]
[4]
[5] [6] [7] [8]
Figure 6. Effectiveness of the optimal feature combination using RF classification method
With the obtained results it can be summarized that the most effective wavelet based features are those based on entropy. Entropy, as a higher order invariant statistics, is commonly used in information theory and in the area of signal processing [20]. It provides valuable information for analyzing non-stationary signals. Entropy is used as a measure for disorder in the system. Hence, it characterizes the uncertainty in information source that behaves in the PQ signal and that property makes it the most effective feature. V.
[9]
[10]
[11]
[12]
CONCLUSION
In this work we have investigated the effectiveness of the different wavelet-based features and their combinations, as frequently used features in classification of PQ disturbances. The effectiveness of the features was experimentally tested using three classification methods, SVM, DT and RF, in case of 11 PQ classes and in presence of different noise levels. The investigation has shown that the effectiveness of a given feature differs depending on the other features it is used with, the noise level present in the signal and the type of the used classification method. With the experimental testing it was shown that in most of the cases when using two or three wavelet based features, instead of high numerous combination, there is no significant impact on the classification accuracy, while the computational efficiency of the used classification algorithm will be significantly increased, whether the PQ signal is pure or is accompanied with different levels of noise. The aim of our future work will be constructing an optimal feature extraction and classification method, for real-time classification of real PQ signals with high accuracy.
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
Emmanouil, S.; Bollen, M.H.J.; Gu, I.Y.H. Expert system for classification and analysis of power system events. IEEE Trans. Power Del., 17, pp. 423–428, 2002. Monedero, I.; León C.; Ropero, J.; García A.; Elena, J.M.; Montaño, J.C. Classification of electrical disturbances in real time using neural networks. IEEE Trans. Power Del., 18, pp. 406–1296, 2003. Dash, P.K.; Mishra, K.S.; Salama, M.M.A. Classification of power system disturbances using a fuzzy expert system and a Fourier linear combiner. IEEE Trans. Power Del., 15, pp. 472–477, 2000. Janik, P.; Lobos, T. Automated classification of power-quality disturbances using SVM and RBF networks. IEEE Trans. Power Del., 21, pp. 1663–1669, 2006. Quinlan, J. R. Induction of decision trees. Mach. Learn.,1986, 1, pp. 81–106. Breiman, L. Random Forests. Mach. Learn. , 2001, 45, pp. 5–32. Bollen, M.H.J.; Gu, I.Y.H. Signal processing of power quality disturbances. Wiley –IEEE Press, 2006. Decanini, J.G.; Tonelli-Neto, M.S.; Malange, F.C.; Minussi, C.R. Detection and classification of voltage disturbances using a fuzzyartmap-wavelet network. Electric Power Systems Research, 81, pp. 2057–2065 2011. Gaing, Z.L. Wavelet-based neural network for power disturbance recognition and classification. Power Delivery, IEEE Transactionso, 19, pp. 1560–1568, 2004. Rata, G.; Rata, M.; Filote, C. Theoretical and experimental aspects concerning Fourier and wavelet analysis for deforming consumers in power network. Elektronika ir Elektrotechnika (Electronics and Electrical Engineering), 1, pp. 62–66, 2010. Milchevski, A.; Taskovski, D. Classification of power quality disturbances using wavelet transform and SVM decision tree. In proceedings of the 11th IEEE Electrical Power Quality and Utilization Conference, Lisbon, Portugal, 17–19 October 2011. Uyara, M.; Yildirima, S.; Gencoglub, M.T. An effective wavelet-based feature extraction method for classification of power quality disturbance signals. Journal of Elec. Power Syst. Res., 78, pp. 17471755, 2008. Galil, T. K. A.; Kamel, M.; Youssef, A.M.; Saadany, E.F.E.; Salama, M.M.A. Power quality disturbance classification using the inductive inference approach. IEEE Trans. Power Deliv., 19, pp. 1812-1818, 2004. He, H.; Starzyk, J.A. A self-organizing learning array system for power quality classification based on wavelet transform. IEEE Trans. Power Deliv., 21, pp. 286- 295, 2006. Upadhyaya, S.; Mohanty, S. Localization and Classification of Power Quality Disturbances using Maximal Overlap Discrete Wavelet Transform and Data Mining based Classifiers. Elsevier, IFAC – Papers On Line, 49, pp. 437–442, 2016. Eri¸sti, H.; Yıldırım, Ö.; Eri¸sti, B.; Demir, Y. Optimal feature selection for classification of the power quality events using wavelet transform and least squares support vector machines. Int. J. Electr. Power Energy Syst., 49, pp. 95–103, 2013. Markovska, M.; Taskovski, D. On the choice of wavelet based features in power quality disturbances classification. In Proceedings of the 2017 IEEE 17th International Conference on Environment and Electrical Engineering (EEEIC), Milan, Italy, 6–9 June 2017. A library for support vector machine. Available online: https://www.csie.ntu.edu.tw/~cjlin/libsvm/ (accessed on 22 December 2016). Ray, P.K.; Mohanty, S.R.; Kishor, N.; Catalao, J.P.S. Optimal feature and decision tree-based classification of power quality disturbances in distributed generation systems. IEEE Trans. on Sustainable Energy, 5, pp. 200-208, 2014. Shannon, C. E. A mathematical theory for communication. Bell Syst. Tech. J., 27, pp. 379–423, 1948.