Features Selection of SVM and ANN Using Particle ... - CiteSeerX

International Journal of Computational Intelligence Research. ISSN 0973-1873 Vol.3, No. 1 (2007), pp. 60-65 © Research India Publications http://www.ijcir.info

Features Selection of SVM and ANN Using Particle Swarm Optimization for Power Transformers Incipient Fault Symptom Diagnosis Tsair-Fwu Lee1,2, Ming-Yuan Cho1 and Fu-Min Fang2 1

Department of Electrical engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan, ROC [email protected], [email protected] 2

Chang Gung Memorial Hospital-Kaohsiung Medical Center, Chang Gung University College of Medicine, Kaohsiung, Taiwan, ROC [email protected]

Abstract: For the purpose of incipient power transformer fault symptom diagnosis, a successful adaptation of the particle swarm optimization (PSO) algorithm to improve the performances of Artificial Neural Network (ANN) and Support Vector Machine (SVM) is presented in this paper. A PSO-based encoding technique is applied to improve the accuracy of classification, which removed redundant input features that may be confusing the classifier. Experiments using actual data demonstrated the effectiveness and high efficiency of the proposed approach, which makes operation faster and also increases the accuracy of the classification. Keywords: feature, particle swarm optimization, incipient fault, diagnosis.

I. Introduction Power transformers deliver electrical energy from generation system to customers, are essential elements of power systems. They need to be carefully and constantly monitored, and fault diagnosis must be carried out on a timely way so that alarms can be given as early as possible. A long in-service transformer is subject to electrical and thermal stresses, which may degrade the oil inside. Accompanying the degradations of power transformers, different gases concentrations will release in the oil, and then, can be used the symptom to identify incipient faults. The most popular method for detecting incipient faults is based on dissolved gas analysis (DGA) [1], which detects degradations in the transformer oil and insulating materials prior to transformer failure. According to IEC 599/IEEE C57 [2], these degradations produce fault-related gases such as hydrogen (H2), oxygen (O2), nitrogen (N2), methane (CH4), ethylene (C2H4), acetylene (C2H2), ethane (C2H6), carbon dioxide (CO2), and carbon monoxide (CO), all of which can be detected in a DGA test. The existence of these gases means that corona or partial

discharge, thermal heating, and arcing may be occurring [3]. The associated energy dissipation is least in corona or partial discharge, medium in thermal heating, and highest in arcing [4]. In Table 1, the IEC standard 599/IEEE C57 criteria [2] using three relative ratios of characteristic gases as the variables for interpretation are shown. This standard allows classification of faults when measurements fall into the specified categories. Various fault diagnosis techniques have been developed based on DGA data, including the conventional key gas method, the ratio-based method [5], and more recently artificial intelligence methods such as expert systems, fuzzy logic [6] and artificial neural networks [7]. Some combinations of these methods are particularly promising [8]. This Paper proposes a new method, based on a particle swarm optimization, which can improve the ability of Support Vector Machines and Artificial Neural Networks to classify the power transformer faults. Table 1.

IEC/IEEE codes for the interpretation of DGA method

Fault code 0 1 2 3 4 5 6 7 8

Fault type No fault 700• thermal fault Low energy partial discharge High energy partial discharges Low energy discharges High energy discharges

C2 H 2 C2 H 4 1.0 0.1-1.0 0.1-1.0

C2 H 4 C2 H 6 f ( gbest )

(7)

where f ( x ) is the objective function subject to maximization. Step 5 Termination Checking Repeat Step 2 to Step 4 until a specified termination condition is met, such as a predefined number of iterations is reached or a failure to continue of progress for a certain number of iterations. Once terminated, PSO reports the g best and f ( gbest ) as its solution. To exhibit the power of PSO, a test program was applied to the following test function, as visualized in the Figure.1:

G

Figure 1. Objective function F1

G

F1 ( x, y ) = − x sin( x ) − y sin( y ), −500 < x, y < 500 where global optimum is at F1 (-420.9687, -420.9687) = 837.9658 .

(8)

In the testing experiment, both learning factors, c1 and c2 , are set to a value of 2, and variable inertia weight w is used according to the suggestion from Shi and Eberhart (1999). Also we used the same parameters in the proposed PSO-based SVM/ANN classifiers. Figure 2 reports the progress of PSO on test function F1 ( x, y ) for the first 300 iterations. At the end of 1000 iterations, F1(-420.971954,-420.956055) = 837.965759 is located, which is very close to the global optimum. So we see it is worthwhile to look into the dynamics of PSO. Figure 3 presents the distribution of particles at different iterations. There is a clear trend that particles start from their initial positions and fly toward the global optimum [12]. It can be noticed that PSO shares many common points with GA, such as random generation of initial population, search for optima by updating generations or iterations, and evaluation of a fitness or objective for possible solutions. However, they have different experience sharing mechanisms during search; all chromosomes of GA share unclassified search experience with each other, while each PSO particle shares only its own search experience (local best) and its companions search experience (global best).

Figure 2. Progress of PSO on objective function F1.

(a) 0-th iteration. (c) 100-th iteration.

(b) 10-th iteration. (d) 300-th iteration.

Figure 3. The distribution of particles at different iterations.

Features Selection of SVM and ANN

63

Learning

Preprocessing

Features selection

Training and Validation Dataset

loo Validation for SVM

PSO

DGA Data Acquisition or Data base dataset

Feature Extraction

Data Selection

Features obtained

SVMs

Test Dataset

Classification

Fault Situation Output

Figure 4. Overall structure of proposed method

V. Proposed Fault Symptom Diagnosis Method In many practical classification tasks we encounter problems with a high-dimensional input space, where we want to find the combination of original input features that contributes most to the task. The overall structure and main components of the proposed method are depicted in Figure.4. As it is well known, SVMs are trained through the following optimization procedure by using the equations of (1) and (3), where K(.) is called the kernel function [13]. The value of the kernel equals the inner product of two vectors, xi and xj, in the feature space Φ(xi) and Φ(xj); that is K(xi, xj) =Φ(xi)· Φ(xj). Any function that meets Mercer’s condition [10] can be used as the kernel function. In this work the RBF (4) is used. Where C is used to weight the penalizing relaxation variable ξ, and 1 denotes the width of the RBF kernel. The PSO technique is exploited in the proposed SVM model to obtain the optimal set of features. PSO performs an optimal search on the feature space, hence to compare the results with ANN did. Training was carried out using first the ANN and then the same experiments were repeated with the SVM using both constant width RBF functions, where the 1 parameter is found by iteration to one decimal place, and using average 1 parameter, with a range of different scaling factors from 1 to 10. After fine tune the three parameters by using leave-one-out (loo) procedure. Three DGA features sets were used in the features extraction phase. We are concerned not only from the basic gas ratios of DGA data, but also their mean (µ), root mean square value (rms), variance (σ2); skewness (normalized 3rd central moment, γ3), kurtosis (normalized fourth central moment, γ4), and other higher order normalized central moments (γ5-γ10) from fifth to tenth as follows [14]: E{[ yi − µ ]n } γn = (9) , n = 1,10, σn

where E represents the expected value of the function. Total numbers of derived features are 36, and compose a feature vector Y= [y1 y2 …y36]T. We randomly extracted instances for learning sets, validation sets and test sets for corresponding phases. In this case we have a search space of size 236. We can represent this search space by a standard binary encoding where a “1” indicates the selection of the feature at the corresponding position. The fitness of each feature subset is evaluated by performing cross-validation. In the training phase, the problems faced in the classification area are not well bounded; it is impossible to determine whether the training data presented is truly representative of the data that will be seen in practice. As a result, the intention of this step is to train the classifier to recognize faults, not just the specific faults contained in the training data. For this reason, a validation set is used in an attempt to try and prevent the SVM becoming overstrained on the training data, and the faults contained. The validation and training sets are used to measure the classification success for the same reason, as it is desired that there should be some measure of performance over two different sets of data that may be statistically quite different. SVMs were created using the kernel function described in (4), 1 values were determined using an iterative process for the constant width SVM. In general SVM is a very complicated non-linear global optimization problem. The number of support vectors shall be minimized if bigger margin is chosen first. In additional, traditionally k-fold cross-validation is used as an estimator of the generalization error [15]. In this study, a negative mean absolute percentage error (-MAPE) is used as the fitness function. The MAPE is as follows [16]: 1 N a − di (10) MAPE = ∑ i × 100% N i =1 ai where ai and di represent the actual and diagnosis values and N is the number of diagnosis periods. In general PSO algorithm, terminate the program when the best fitness has not changed more than a very small value, i.e. 10-5 over the last

64

Tsair-Fwu Lee et al

generation or a predefined number of iteration is reached. Based on fitness functions, particles with higher fitness values are more likely to yield a smaller MAPE value. When -MAPE value went down to a threshold or stuck over 30 generation the optimization procedure will be terminated. Once procedures are terminated, particle swarm optimization reports the g best and f ( gbest ) as its solution. At the end of learning stage, a set of features are obtained for the PSO-based SVM classifier. After learning phase, the classifier is ready to be used to classify new DGA pattern samples in classification phase. In this paper, we demonstrate the PSO’s efficiency in selecting features of an SVM with an RBF kernel.

the success rate was 99.4% for the ANN and 99.8% for the SVM classifier. The computational times (as measured on a PC with a 2.6 GHz Pentium IV processor and 512 MB RAM) for training these four classifiers are also shown, though a direct comparison is difficult due to differences in each case’s code efficiency. These values are mentioned for the purpose of rough comparison only. It should be mentioned, however, that the difference in computational time for training should not be very important, as it can be done off-line. Simulation results demonstrate that a PSO-based approach can acquire optimal features for any given SVM classifier design criteria.

VI. Experimental Results

Straightforward No. of Training Test Training features success(%) success(%) time(s) ANN 36 78.3 82.4 22.221 SVM 36 79.4 89.7 4.704 Table 2. Performance without PSO optimization for the Taipower Company dataset

G

G

A DGA dataset was compiled to study the feasibility of this approach. This dataset is comprised of 720 historical dissolved gas records, and includes a variety of power transformers of the Taipower Company, which is the sole electricity supplier in Taiwan. It is classified into the 6 IEC 60599 category descriptions based on actual conditions. The total number of derived features is 36. We randomly extract instances from these datasets to create the training, validation, and test sets for the corresponding phases [9].The networks are trained using training and validation set both containing 300 samples each, while testing is carried out using another 120 samples. Performance is measured in terms of the network’s classification success on unseen testing data. Training is stopped when the classification performance of the validation set starts to diverge from that of training dataset. In this section classification results are presented for straightforward ANNs and SVMs, both with and without feature selection. For each ANN the number of neurons in the hidden layer was fixed at 36; for each SVM the width (σ) was set to 0.5 and the tradeoff regularization parameter (C) was set to 100. These values were chosen on the basis of initial training trials. Table 2 shows classification test results for the basic ANN and SVM methods using this dataset without automatic PSO selection of features. In this case, all the features from the DGA signals were used. The success rates are 82.4% for the ANN, and 89.7% for the SVM. Table 3 shows the results after PSO optimization is performed, demonstrating that the proposed method is capable of improving classification systems by removing redundant and potentially confusing input features. The input features were automatically selected from the range of 36 DGA features described above by the proposed method. The remaining features are 4 for both ANN and SVM respectively. It contains three DGA datasets and their variance for the best results. In the ANN case, the number of neurons in the hidden layer was also selected in the range 10–36; for SVMs, the RBF kernel width was also selected in the range 0.1–2.0. The test performance improved substantially with feature selection for both ANN and SVM;

Classifier

With PSO No. of Training Test Training features success(%) success(%) time(s) ANN 4 100 99.4 6.335 SVM 4 100 99.8 2.476 Table 3. Performance with PSO optimization for the Taipower Company dataset Classifier

VII. Conclusion In this paper, the selection of appropriate input features has been optimized using a PSO-based approach. A procedure was presented for detecting power transformer faulty situation using two classifiers; namely ANNs and SVMs with PSO-based feature selection from inherently imprecise DGA dataset. After feature selection, it can be seen that the performance on the testing dataset have all increased. Test results from practical data demonstrate the effectiveness and high efficiency of the proposed approach. The classification accuracy of SVMs was better than that of ANNs when applied without the PSO. With PSO-based selection, the performances of the two classifiers were comparable at nearly 100%. Experiments using real data demonstrated the effectiveness and high efficiency of the proposed approach, which makes operation faster and also increases the accuracy of the classification.

References [1] IEC Publication 599, “Interpretation of The Analysis of Gases in Transformers And Other Oil-Filled Electrical Equipment In Service,” First Edition, 1978. [2] “IEEE Guide for The Interpretation of Gases Generated In Oil-Immersed Transformers,” IEEE Std.C57.104, 1991.

Features Selection of SVM and ANN [3] R.R. Rogers, “IEEE And IEC Codes to Interpret Faults in Transformers Using Gas in Oil Analysis,” IEEE Trans Electron. Insulat. 13 (5), pp.349–354, 1978. [4] M, Dong, “Fault Diagnosis Model Power Transformer Based on Statistical Learning Theory and DGA,” IEEE 2004 International Symposium on Electrical Insulation, p85–88, 2004. [5] P. Purkait, S. Chakravorti, An expert system for fault diagnosis in transformers during impulse tests, in: PES Winter Meeting, vol. 3, pp. 23–27 Jan, 2000. [6] Q. Su, C. Mi, L.L. Lai, P. Austin, A fuzzy dissolved gas analysis method for the diagnosis of multiple incipient faults in a transformer, IEEE Trans. Power Systems. vol. 15 (2), pp.593–598, 2000. [7] L.B. Jack, A.K. Nandi, “Fault Detection Using Support Vector Machines and Artificial Neural Networks: Augmented by Genetic Algorithms”, Mech. Syst. Signal Process. 16 (2–3), pp.373–390, 2002. [8] C.H. Yann, A new data mining approach to dissolved gas analysis of oil-insulated power apparatus, IEEE Trans. Power Deliv. vol. 18 (4), pp. 1257–1261, 2003. [9] Michel Duval,”Interpretation of Gas-In-Oil Analysis Using New IEC Publication 60599 and IEC TC 10 Databases”, IEEE Electrical Insulation Magazine, Vol. 17, No. 2. pp. 31–41, 2001. [10] V.N. Vapnik, “The Nature of Statistical Learning Theory” Springer-Verlag, New York, 1995. [11] O. Chapelle, V. Vapnik, O. Bousqet, and S. Mukherjee. Choosing Multiple Parameters for Support Vector Machines. Machine Learning, vol. 46(1), pp.131–159, 2002. [12] S.-C. Chu, C.-S. Shieh, and J. F. Roddick, “A Tutorial on Meta-Heuristics for Optimization,” in J.-S. Pan, H.-C. Huang, and L. C. Jain (Eds.), Intelligent Watermarking Techniques, World Scientific Publishing Company, Singapore, Chapter 4, pp. 97–132, 2004. [13] H. Frohlich. Feature Selection for Support Vector Machines by Means of Genetic Algorithms. Master’s thesis, University of Marburg, 2002.

65 [14] B. Samanta. “Gear fault detection using artificial neural networks and support vector machines with genetic algorithms”, Mech. Syst. Signal Process. vol. 18, pp. 625–644, 2004. [15] C. Cortes, V. Vapnik, “Support Vector Networks” Mach Learning 20 (3) pp.273–295, 1995. [16] P.F. Pai, W.C. Hong, “Forecasting regional electricity load based on recurrent SVM with GA”, Electric power system research vol. 74, pp. 417–425, 2005.

Author Biographies Tsair-Fwu Lee is a doctoral candidate in the department of Electrical engineering at National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan. He received the M.S. Degree in the computer and communication engineering from National Kaohsiung First University of Science and Technology, Kaohsiung, and B.S. degree from National Kaohsiung University of Applied Sciences, Kaohsiung, in Electronic engineering. His major research interests are: evolutionary optimization, artificial intelligent algorithm for power system and image applications.

Ming-Yuan Cho was born in Kaohsiung, Taiwan, ROC, in 1962. He received the diploma in electrical engineering from the Kaohsiung Institute of Technology, Kaohsiung, Taiwan, ROC in 1982, and M.S. and Ph.D. degree in electrical engineering from National Sun-Yat San University, Kaohsiung, Taiwan, ROC, in 1989 and 1992 respectively. Since 1992, he has been with the Department of Electrical Engineering National Kaohsiung University of Applied Sciences, where he is currently a Professor associated with the Dean of Continuous and Extend education at school. He has also served as the chairman of Department of Electrical Engineering at school from August, 2000 to July, 2003. His research interests include AI applications to power system, power system monitoring and control, electric energy management and control, and green power control.

Fu-Min Fang is now an associate professor and the chief attending physician of Radiation Oncology in Chang Gung Memorial Hospital Kaohsiung Medical Center, Kaohsiung, Taiwan. He earned the M.D degree from Taipei Medical University and Ph.D. degree from the Graduate Institute of Medicine, Kaohsiung Medical University. Dr. Fang has been a Visiting Assistant Professor in the Department of Experimental Radiation Oncology at UT-MD Anderson Cancer Center, Houston, USA. Dr. Fang has published more than 50 papers, with interests in the clinical practice and technical evolution of intensity modulated radiotherapy in cancer patients.

Features Selection of SVM and ANN Using Particle ... - CiteSeerX

Features Selection of SVM and ANN Using Particle ... - CiteSeerX

Suggest Documents

Feature Selection and Parameters Optimization of SVM Using Particle

Feature Selection using PSO-SVM - CiteSeerX

Multiclass SVM Model Selection Using Particle Swarm ... - IEEE Xplore

Invariant Moments based War Scene Classification using ANN and SVM

automatic actomyosin complex selection using svm

Model Selection for Ranking SVM Using ...

IRJET- Music Classification using Spectral Features and SVM

Improved Face Recognition Rate Using HOG Features and SVM ...

PCG Classification Using Multidomain Features and SVM Classifier

SPECTROGRAM BASED FEATURES SELECTION ... - CiteSeerX

Optimal Selection of ANN Training and Architectural Parameters Using

Selecting Samples and Features for SVM Based

Comparing performance and robustness of SVM and ANN ... - MSSANZ

Gabor Features-Based Classification Using SVM for ... - Google Sites

Performance Comparison of SVM and ANN for Handwritten ... - arXiv

Text Feature Selection using Particle Swarm

Performance Comparison of SVM and ANN for Handwritten ... - arXiv

Application of SVM and ANN for intrusion detection - National Taiwan ...

Comparative Evaluation of ANN- and SVM-Time Series Models ... - MDPI

Performance Comparison of SVM and ANN for Handwritten ... - arXiv

Weather Forecasting using ANN and PSO - CiteSeerX

SPECTROGRAM BASED FEATURES SELECTION USING MULTIPLE

SVM - CiteSeerX

An Efficient Method for Variables Selection Using SVM-Based Criteria