Performance Comparison of Supervised Machine ... - MAFIADOC.COM

0 downloads 0 Views 425KB Size Report
panels, consoles, annunciators, flash lights, LEDs, hooters, etc. .... 2. Self adaptive learning rate. Gradient Descent with adaptive learning rate backpropagation ...
Performance Comparison of Supervised Machine Learning Algorithms for Multiclass Transient Classification in a Nuclear Power Plant Manas Ranjan Prusty1,*, Jaideep Chakraborty1, T. Jayanthi1, and K. Velusamy2 1

Computer Division, Indira Gandhi Centre for Atomic Research, Kalpakkam, India [email protected], {jaideep,jayanthi}@igcar.gov.in 2 Mechanics & Hydraulics Division, Indira Gandhi Centre for Atomic Research, Kalpakkam, India [email protected]

Abstract. For safety critical systems in nuclear power plant (NPP), accurate classification of multiclass transient leads to safer operation of the plant. Supervised machine learning is a key technique which solves multiclass classification related problems. The most widely used multiclass supervised machine learning methods for this purpose are k-nearest neighbor algorithm, support vector machine algorithm and artificial neural network (ANN) algorithm. This paper describes a comparative study on the performance of these algorithms towards classifying some of the transients in NPP. The performance analysis is mostly based on the prediction accuracy in classifying the correct transient occurred. Along with prediction accuracy, total number of epochs, training time and root mean square error was also observed as a characteristic feature for determining the performance of any backpropagation ANN. A 10-fold cross validation was carried on all these algorithms for ten times and the best among them was finally concluded for multiclass transient classification in NPP.

Keywords: Nuclear power plant, Supervised machine learning, k-nearest neighbor, Support vector machine, Artificial neural network, k-fold cross validation.

1

Introduction

A nuclear power plant (NPP) contains many safety critical systems. For an operator sitting in the main control room in a NPP, decision making becomes very difficult when he is overloaded with too much of information during an adverse situation. This information comes from the big displays, small displays, control panels, consoles, annunciators, flash lights, LEDs, hooters, etc. At this moment, the operator should be provided with only the most important information related to that adverse situation. An occurrence of a transient is accounted as an adverse condition where the plant deviates from it steady or normal state to an abnormal condition. During this time, the operator action holds a very key role in bringing back the plant

to a stable state. In order to achieve this, the operator has to identify properly the transient which lead to this abnormal state. There are cases where indentifying the transient during a post-mortem process is also an important task based on the history of the data collected during that time. These are some of the instances where classification of transients is vital. In NPP, there could be a number of transients which may occur during the running state of a plant which makes this a multiclass classification problem. A multiclass classification problem can be dealt with using supervised learning [1]. Supervised learning is a machine learning methodology where a training set of data with known input and output labels, is used to train a system for unknown or test data. The output label of the test data is found by mapping a test data on to a function inferred from the training data. Out of the various supervised learning algorithms for multiclass classification, k-nearest neighbor (kNN), support vector machine (SVM) and artificial neural network (ANN) are popularly used [2-8]. Fuzzy logic can also be used for online transient identification in NPP [9]. There are number of modifications being done to these algorithms for classification of power quality disturbances [1011]. The performance of these algorithms is measured based on the prediction accuracy, training speed, computational cost and root mean square error. As this paper is based on transient classification of a NPP, the prediction accuracy is the major performance parameter which is considered. Other performance parameters are also looked upon with keen interest. In this paper, section 2 explains briefly about the different supervised machine learning algorithms that have been considered for multiclass classification of transients. Section 3 explains the methodology adapted for the execution of the code for these algorithms. Section 4 explains the results and discussions on the performance of each algorithm. Section 5 concludes the paper inferring the best supervised machine learning algorithm for multiclass transient classification from the analyzed algorithms.

2

Training Algorithms

The various supervised machine learning algorithms which has been considered in this paper are kNN algorithm, SVM and backpropagation algorithms for ANN. A brief summary of these algorithms have been highlighted in this section. 2.1

Overview of kNN algorithm

The kNN algorithm is a supervised machine learning algorithm which classifies a query or a test data based on the k-nearest training data taken as reference. It is a very simple and robust algorithm. No extra computation time as the training set samples are used during execution time. This algorithm proves to be very effective, in terms of reducing the misclassification error, when the number of samples in training dataset is large [12]. Another advantage of the kNN method over many other supervised machine learning methods like SVM, decision tree, neural network, etc., is that it can easily deal with problems in which the class size is three and higher [13]. So it can be easily used in incremental learning environments (adding new train data during

execution) but its execution time is usually longer than other algorithms (which have training phase). 2.2

Overview of SVM algorithm

SVM is a supervised machine learning algorithm which is mostly used for classification or regression analysis. A SVM constructs an optimal hyper plane (OHP) with the largest distance between the nearest training data of opposite class called as support vectors. This distance is called functional margin which is inversely proportional to the generalization error of the classifier [14]. DirectSVM follows a iterative update scheme that is based on a few intuitively-simple heuristics [15]. Another way of getting the maximum margin hyperplane is by creating non linear classifiers using kernel trick. Some commonly used kernels are polynomial kernel, Gaussian radial basis function (rbf) kernel and hyperbolic tangent kernel [16]. Multi kernels SVM are also used with high accuracy and great generalization [17]. In this paper, various single kernel functions are used for classification and the prediction accuracy was found.

2.3

Overview of ANN algorithm

ANN is a supervised learning algorithm where a network is established between neurons in the input, output and one or more hidden layers. These layers consist of interconnected neurons and their corresponding optimized weights yields the final output. The performance of the network is calculated by the prediction accuracy, the number of epochs, the training time and the mean square error (MSE). ANN is used in classification and regression related problems. Backpropagation is an optimization process where the aim is to get minimum error in least number of epochs and lesser training time. This improves the performance of the network. There are many ways of carrying out the backpropagation process. There are basically six various categories of backpropagation algorithm [18]. In this paper, these six classes of the back propagation algorithm have been considered. MATLAB is used for the execution using their training functions. 1. Additive Momentum Gradient Descent with momentum backpropagation (GDM) 2. Self adaptive learning rate Gradient Descent with adaptive learning rate backpropagation (GDA) Gradient Descent with momentum and adaptive learning rate backpropagation (GDMA) 3. Resilient Backpropagation (RB) 4. Conjugate Gradient Backpropagation Scaled conjugate gradient back propagation (SCG) Conjugate gradient backpropagation with Powell-Beale restarts (CGB)

Conjugate gradient backpropagation with Fletcher-Reeves updates (CGF) Conjugate gradient backpropagation with Polak-Ribiére updates (CFP) 5. Quasi-Newton Levenberg-Marquardt backpropagation (LM ) BFGS quasi-Newton backpropagation (QN) One-step secant backpropagation (OSS) 6. Bayesian Regularization (BR)

3

Task and Methodology

Transient classification in NPP during crisis situation is a confusing operation. Supervised machine learning could be used in order to make life simpler for the operators in the main control room. This necessarily may not act as an expert system but could certainly guide the operator and avoid information overloading. This could also be used as a post-mortem tool after any accident has happened. Out of the huge varieties of supervised machine learning algorithms, some very widely used multiclass classification algorithms have been chosen for this purpose. In this section, we have compared the performance of kNN, SVM and ANN for classification of some of the steam water side transients. The dataset collected consists of five classes. These five classes represent five transients in the steam water side of Prototype Fast Breeder Reactor (PFBR) operator training simulator [19]. This dataset was divided into training dataset and test data set. The training dataset consisted 746 rows and each row had 2 columns. The number of rows represents the total number of training data and the number of column represent the attribute value corresponding to each training data. The training dataset had a training dataset label representing the class to which each row belongs to i.e. 746 rows and one column in this training dataset label. These two datasets are used for training purpose. The test dataset consisted on 32 rows and each row had 2 columns. The test dataset was used for testing purpose. Three datasets, training dataset, training dataset label and the test dataset was fed to the code. The test dataset label was not fed to the code as it was used only for personal verification purpose. 3.1

Classification using kNN algorithm

A training dataset was prepared containing two attributes and a multiclass label. MATLAB was used as the medium for execution of the code. The code was made to run for various values of k from 3 to 51. This was done in order to check the performance of the kNN algorithm for different values of k and find the best from it. The performance was also checked using 10-fold cross validation method for ten times. 3.2

Classification using SVM algorithm

The same training dataset was used to check the performance of SVM algorithm. The code was executed in MATLAB using its inbuilt function. For SVM algorithm, different kernels were used such as linear, quadratic, polynomial and radial basis function kernels. The percentage accuracy on each kernel was noted. The average prediction accuracy of 10-fold cross validation for ten times on the SVM algorithm was also noted. 3.3

Classification using Back Propagation ANN algorithm

The same training dataset was used to check the performance of a number of back propagation ANN algorithms on a two layered network. This two layered network consisted of two inputs to the input layer, one hidden layer and one output from the output layer. The number of neurons in the hidden layer varied from 5 to 20 in each execution. This was done in order to observe the importance of the number of neurons in the hidden layer of a two layered neural network. The code was executed in MATLAB using the inbuilt functions for specific back propagation ANN algorithms. Before executing the program, the training dataset label was modified according to the MATLAB format for executing ANN algorithms. The performance of each algorithm was noted based on the prediction accuracy, number of epochs, training time and root mean square error. A 10-fold cross validation was done on each algorithm for ten times. Performance parameters such as average prediction accuracy, average root mean square error, total number of epochs, total training time and the best prediction accuracy from the cross validation process were also observed.

4

Result and Discussions

Figure 1 shows the performance of a kNN algorithm based transient classification system with different k-values. Figure 1(a) shows the prediction accuracy which varied from 71% to 75% for k-values ranging from 3 to 51. This accuracy did not follow any specific pattern for subsequent changes in the k-values. Figure 1(b) shows the average prediction accuracy for 10-fold cross validation of kNN algorithm done for ten times for transient classification. This varied from 72% to 76%. As a rule of thumb, the k-value is taken to be square root of the number of training sample i.e. 27, to get the maximum prediction accuracy. But here, the prediction accuracy was found to be 74%. The average prediction accuracy after cross validation was around 76% which is also not a considerable change from the rest of the accuracies for different kvalue. This shows that there is no significant change in the prediction accuracy of the model if the k-value varies from 3 to 51 for kNN algorithm.

(a) (b) Figure 1: (a) Prediction accuracy of kNN algorithm for different k-values (b) Average prediction accuracy for 10-fold cross validation of kNN algorithm for different k-values

Figure 2 shows the performance of a SVM algorithm based transient classification system for different kernel functions. From Figure 2(a), the prediction accuracy of a linear kernel was found to be 41.5%. For all the other types of kernels those were taken into consideration produced a prediction accuracy ranging from around 90% to 94%. The rbf kernel and the polynomial kernel with order 3, 6, 8 and 9 produced the maximum prediction accuracy of 93.75%. Figure 2(b) shows the average prediction accuracy for 10-fold cross validation of SVM algorithm done for ten times for transient classification. The maximum average prediction accuracy was found to be around 94% for the rbf kernel function and for almost all polynomial kernel functions except the polynomial function with degree 10 which had an average accuracy of around 90%.

(a) (b) Figure 2: (a) Prediction accuracy of SVM algorithm for different kernels (b) Average prediction accuracy for 10-fold cross validation of SVM algorithm for different kernels

The performance parameters which were considered in this paper for the analysis of the back propagation ANN algorithms were prediction accuracy, root mean square error, number of epochs and training time. In a NPP, the most important performance parameter which should be considered is the prediction accuracy of the transient

classification system. Figure 3 shows the performance of different back propagation ANN algorithms with different number of neurons in the single hidden layer. This shows that BR back propagation ANN algorithm with 5 neurons in the hidden layer was able to achieve best prediction accuracy with least root mean square error as shown in Figure 3(a) and Figure 3(b). The number of epochs taken is also less compared to others as shown in Figure 3(c). The training time increases for BR backpropagation as the number of neurons increases in the hidden layer as shown in Figure 3(d). This is acceptable for safety critical systems in NPP where the prediction accuracy has a major importance than any other performance parameters. This is the best performance achieved among all the other supervised machine learning algorithms analyzed in this paper. None of the other algorithms were able to produce this prediction accuracy even with higher number of neurons using backpropagation ANN. With 5 neurons in the hidden layer, other back propagation ANN algorithms which has produced appreciable prediction accuracy are RP, SCG, GDA and GDAM back propagation ANN algorithms. These algorithms could be used for systems which are comparatively less critical and where the processing load is limited. Again, there is no real pattern on any of the performance parameters in any of the back propagation ANN algorithms when the number of neurons is increased.

(a)

(b)

(c)

(d) Figure 3: (a) Predictive Accuracy (b) Root Mean Square Error (c) Number of Epochs (d) Training Time for Backpropagation ANN algorithms

Figure 4 shows the average of the performance parameters of the 10-fold cross validation of these backpropagation ANN training algorithms done for ten times. Figure 4(a) shows that for BR back propagation ANN algorithm, 8 neuron hidden layer produces a slightly better average prediction accuracy than 5 neuron hidden layer. Figure 4(b) shows that the average root mean square error of BR back propagation ANN algorithm increases slightly. These changes happen because the cross validation method is the average of all the performance captured for the specified number of iterations or folds. These characteristic features are data dependent. Depending on the nature of the training dataset, the performance of the test dataset from these machine learning algorithms could be determined. The total number of epochs and the total training time is within acceptable range for BR back propagation ANN algorithm as shown in Figure 4(c) and Figure 4(d) for safety critical systems. Figure 4(e) shows that the best prediction accuracy to be around 99% irrespective of the number of neurons in the hidden layer for BR backpropagation ANN algorithm. Finally, from Table 1 it is evident that BR back propagation ANN algorithm is found to be computationally low in cost and produces a much acceptable performance among the above analyzed supervised machine learning algorithms for safety critical systems in NPP.

(a)

(b)

(c)

(d)

(e) Figure 4 : (a) Average Prediction Accuracy (b) Average Root Mean Square Error (c) Total Number of Epochs (d) Total Training Time (e) Best Prediction Accuracy for Backpropagation ANN algorithms Table 1. Comparison of kNN, SVM and ANN algorithms Algorithm

Best Prediction Accuracy (%)

Respective Parameter

kNN SVM GDM-ANN

75.20 93.75 61.60

k=13 rbf kernel 5 neurons

Respective average prediction accuracy after 10-times 10-fold cross validation (%) 75.96 93.17 24.1

GDA-ANN

94.60

8 neurons

90.66

GDMA-ANN RB-ANN CGB-ANN

99.10 97.30 96.40

8 neurons 13 neurons 15 neurons

89.5 95.29 81.67

CGF-ANN

85.70

15 neurons

88.57

CGP-ANN

98.20

8 neurons

75.6

SCG-ANN

97.30

5 neurons

91.8

QN-ANN LM-ANN

94.60 95.50

15 neurons 8 neurons

90.62 93.84

OSS-ANN

91.10

15 neurons

90.25

BR-ANN

99.10

5 neurons

95.68

From Table 1, the best two classifiers based on the cross validation accuracy are RB-ANN and BR-ANN. For better understanding and clarity, ROC of these two algorithms is shown in Figure 5. It shows that BR-ANN performs better than RBANN based on ROC. Furthermore, statistical analysis on both the algorithms was conducted using paired sample t-test right tailed with 5% significance level to come to

an inference. It was finally concluded from all these tests that BR-ANN performed better than RB-ANN.

(a) (b) Figure 5 : (a) ROC of RB-ANN (b) ROC of BR-ANN

5

Conclusion

In this paper, it is concluded that the Bayesian Regularization backpropagation ANN is the best among the above considered supervised machine learning algorithms for transient classification in safety critical systems in NPP. This paper also says that high prediction accuracy can be achieved with less number of neurons in a two layered neural network containing a single hidden layer. This prediction accuracy is better than kNN and SVM too. The root mean square error was also found within limit. The training time is less and the number of epochs is small for this algorithm compared to any other backpropagation ANN algorithm. But for comparatively less critical systems with limited processing load, Resilient backpropagation, Gradient descent with adaptive learning backpropagation and Scaled conjugate gradient backpropagation performs well with better prediction accuracy and lower training time.

6

Acknowledgement

The authors express their sincere thanks to the PFBR Operator Training simulator (KALBR-SIM) team members and Shri S. A. V. Satya Murty, Director, EIRSG, IGCAR for providing constant guidance and support in completing this research. The authors are greatly indebted to the constant support and motivation provided by Dr. P. R. Vasudeva Rao, Director, IGCAR.

References 1. Waegeman, W., Verwaeren, J., Slabbinck, B., Baets, B.D.: Supervised learning algorithms for multi-class classification problems with partial class memberships. Fuzzy Sets and Systems. 184, 106–125 (2011) 2. Ventouras, E.M., Asvestas, P., Karanasiou, I., Matsopoulos, G.K.: Classification of ErrorRelated Negativity (ERN) and Positivity (Pe) potentials using kNN and Support Vector Machines. Computers in Biology and Medicine. 41, 98–109 (2011) 3. Chen, J., Wang, C., Wang, R.: Adaptive binary tree for fast SVM multiclass classification, Neurocomputing. 72, 3370–3375 (2009) 4. Kumar, M.A., Gopal, M.: Reduced one-against-all method for multiclass SVM classification. Expert Systems with Applications. 38, 14238–14248 (2011) 5. Siuly, Li, Y.: A novel statistical algorithm for multiclass EEG signal classification. Engineering Applications of Artificial Intelligence. 34, 154–167 (2014) 6. Oong, T.H., MatIsa N.A.: One-against-all ensemble for multiclass pattern classification. Applied Soft Computing. 12, 1303–1308, (2012) 7. K. Chena, b, L. Xub, H. Chib, Improved learning algorithms for mixture of experts in multiclass classification, Neural Networks, 12, 1229–1252 (1999) 8. Ahmed, S.S., Rao, B.P.C., Jayakumar, T.: A framework for multidimensional learning using multilabel ranking. International Journal of Advanced Intelligence Paradigms. Vol.5, Issue.4, 299-318, (2013) 9. Prusty, M.R., Chakraborty, J., Seetha, H.,Jayanthi, T., Velusamy, K.: Fuzzy logic based transient identification system for operator guidance using prototype fast breeder reactor operator training simulator. Proceedings of IEEE International Advance Computing Conference (IACC) , 1259 – 1264 (2014) 10. Mishra, S., Bhenede, C.N., Panigrahi, B.K.: Detection and Classification of Power Quality Disturbances Using S-Transform and Probabilistic Neural Network. IEEE Transactions on Power Delivery. Vol. 23, No. 1, 280 – 287(2008). 11. Panigrahi, B.K., Pandi, V.R.: Optimal Feature Selection for Classification of Power Quality Disturbances Using Wavelet Packet based Fuzzy k-Nearest Neighbor Algorithm. IET Proceedings of Gen Trans and Distribution, Vol 3, Issue 3, 296-306(2009) 12. Saini, I., Singh, D., Khosla, A.: QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. Journal of Advanced Research. 4, 331– 344(2013) 13. Yazdani, A., Ebrahimi, T., Hoffmann, U.: Classification of EEG signals using Dempster Shafer theory and a K-nearest neighbor classifier. Proceedings of the 4th International IEEE EMBS Conference on neural engineering, Antalya, Turkey. 327–330(2009) 14. V. Vapnik: The Nature of Statistical Learning Theory, Second Edition,' New York: Springer-verlag(1999) 15. Roobaert, D.: DirectSVM: a fast and simple support vector machine perceptron. Proceedings of the 2000 IEEE Signal Processing Society Workshop. 356-365(2000) 16. Ahmed, S.S., Rao, B.P.C., Jayakumar, T.: Radial basis functions for multidimensional leaming with an application to nondestructive sizing of defects. IEEE Symposium on Foundations of Computational Intelligence (FOCI). 38-43 (2013) 17. Chen, F., Tang, B., Song, T., Li, L.: Multi-fault diagnosis study on roller bearing based on multi-kernel support vector machine with chaotic particle swarm optimization. Measurement. 47, 576–590(2014) 18. Pan, X., Lee, B., Zhang, C.: A Comparison of Neural Network Backpropagation Algorithms for Electricity Load Forecasting. IEEE International Workshop on Intelligent Energy Systems (IWIES). 22–27(2013) 19. Design Document on PFBR Simulator - PFBR/ 08610 / DN / 1000 /Rev A, 2003

Suggest Documents