Power System Security Assessment Using AdaBoost Algorithm Morteza Sadeghi∗ , Mohammad Amin Sadeghi† , Saber Nourizadeh∗ , Ali Mohammad Ranjbar∗ and Sadegh Azizi∗ ∗ School
of Electrical Engineering Sharif University of Technology, Tehran, Iran Email: m
[email protected] † School of Computer Engineering Sharif University of Technology, Tehran, Iran Email:
[email protected] Abstract— AdaBoost algorithm was used as a new machine learning approach in power system security assessment by classifying pre-fault data of power system network. Requirements of implementing AdaBoost on a security assessment problem are discussed and a simple transient stability assessment was solved. ROC curves are introduced as a criterion for accuracy of classification method. Analysis was made on different types of power system data to understand effect of each parameter on a security assessment. Power system security assessment, AdaBoost, Index Terms Boosting, Transient stability assessment
I. I NTRODUCTION Security assessment of power systems is one of the most important aspects of power systems operation and control. The purpose of power systems security assessment is to evaluate the ability of the power system to withstand any credible contingency and to propose proper actions in order to remove system weaknesses [1]. Nowadays, real time Dynamic Security Assessment (DSA) of power systems is widely used due to restructuring in power systems and closer operation of power systems to stability margins. This assessment usually studies power system stability. Power system stability is classified into three categories: generators rotor angle stability, voltage stability and frequency stability. Transient stability as a subcategory of rotor angle stability, largedisturbance voltage stability and short term frequency stability problems are more important because in such cases the operator has little time to do proper actions. The study period of interest in these problems is within a few seconds for transient stability, and up to several seconds for some largedisturbance voltage stability problems [2]. Therefore, calculation time is the essential key in solving such problems. Most traditional methods in security assessment can not be applied here, for they are either slow like Time Domain Simulations, or too difficult to be used like Lyapunov-like direct methods for transient stability assessment [3]. So Automatic Learning (AL) was suggested as the key solution to this problem due to its computation efficiency, its interpretability and its ability to handle uncertainties in operational power systems operational problems [4].
After the introduction of Phasor Measurement Units (PMU) and their use as a powerful measurement tool in power systems which made possible for fast accurate state estimation algorithms, AL methods came to use in power system problems. AL is a method which designs automatic procedures for learning us a set of pre-solved samples. These pre-solved samples are called Learning Set (LS) [4]. Four main families of AL methods are used in the literature such as Decision Trees (DT) based approaches, statistical pattern recognition methods like the k-NN approaches, Adaptive Neural Networks (ANN) [5], and finally the Scalable vector machine(SVM). Several hybrid methods are also used in order to assess power system security [4]. The main assets of automatic learning are its computational efficiency, interpretability and ability to manage uncertainties. Although DTs are computationally efficient and provide interpretability, they are not capable of taking advantage of all accessible data and are sensitive to overtraining. Statistical pattern recognition methods and SVM methods are not computationally efficient to handle a large learning set. Finally, ANN approaches are computationally efficient but they are not interpretable [4]. Due to such disadvantages, a new method is devised in this paper which has all the advantages: computation efficiency, accuracy, interpretability and sensitivity analysis with the ability to handle large volumes of information for bulk power systems with an acceptable training time. The AdaBoost algorithm is a new approach in machine learning methods which has been reported successful in many machine learning and computer vision applications [?]. though AdaBoost Algorithm is used largely in machine learning tasks, it has not yet used in power systems operation and control problems. AdaBoost algorithm has several advantages over previously used methods. First, AdaBoost algorithm is efficient in cases that with large amounts of data are dealt. Millions of samples can be processed by AdaBoost in an affordable time. Second, AdaBoost is more robust than previous machine learning methods. Third, AdaBoost is able to use all information in data set, while Many methods like DTs are in contrast, only capable of making a few decisions for each test data. Fourth, it provides
Fig. 1.
weak and strong classifiers in Boosting algorithm
a threshold which can be used to decide the False Positive rate and True Positive rate in the trade-off. This feature can be used to handle uncertainties and performing error analysis. Fifth, AdaBoost Algorithm selects and assigns more weights to more informative features in training data set. This provides information about the significance of different patterns in the training data and can be used for interpretation. Finally, AdaBoost is mathematically insensitive to over-training and the training error diverges to zero exponentially. The last feature is mathematically proven in [?]. This paper is managed az follows, section 2 provides information about AdaBoost algorithm for non-familiar readers. In section 3, the method of implementing AdaBoost on a security assessment problem is presented. In sections 4 and 5 simulations and experimental results are presented. conclusions and future works are explained in section 6. II. B OOSTING AND A DA B OOST ALGORITHM Boosting is a robust machine learning algorithm for performing object classification. Given several objects that belong to one of two given classes, The purpose of Boosting algorithm is to classify objects in each of the two classes by examining their characteristics. Boosting combines the results of several weak classifiers in order to construct a strong classifier. A weak classifier is slightly better than a random classifier. However, a strong classifier is arbitrarily well correlated with a perfect classifier. Boosting develop a linear combination of the input set of weak classifiers, in order to develop a strong classifier. It specifies more weights to the more accurate weak classifiers and less weight to less accurate weak classifiers. Fig. 1 illustrates the configuration of weak classifiers, the strong classifier, and Boosting algorithm. A. AdaBoost Algorithm AdaBoost, short for Adaptive Boosting, is a branch of Boosting algorithm which specifies the weight adaptively.
The AdaBoost algorithm, introduced in 1995 by Freund and Schapire [?], solved many of the practical difficulties of the earlier boosting algorithms. In order to make the paper complete for reader, A very simple version of AdaBoost is explained. The algorithm takes as input a training set (x1 , y1 ), ...(xm , ym ) where each xi is an object in the input training data belonging to the class yi and each yi is either +1 or −1. In this paper we dont extend AdaBoost to multi-class problems. The weak classifiers C1 , ...CT are also provided for AdaBoost. Each Ct is a function which inputs an object xi and outputs a guess y˜i . The guessed class y˜i must be slightly correlated with the true class yi . AdaBoost calls weak the classifier Ct in a series of rounds t = 1, ..., T , and calculates a weight αt for Ct according to its classification error ²t . The algorithm also keeps weights w1 , ..., wm for the training objects. Initially all weights are set equally, but on each round, the weights of incorrectly classified objects are increased so that the weak classifiers in the future rounds are forced to focus on hard examples in the training set. Pseudo code for AdaBoost is given as follows: •
•
•
Inputs (x1 , y1 ), ...(xm , ym ) where yi ∈ {+1, −1} and xi ∈ {objectspace} C1 , ...CT are weak classifiers Body 1 initialize wi = m For t = 1, ...T calculate y˜i = Ct (xi ) for i = 1, ..., m ²t = Σwi (yi 6= y˜i ) t αt = 12 ln( 1−² ²t ) wi = wi exp(−αt yt y˜t )/Zt for i = 1, ..., m Zt is a normalizaion factor to keep w a distribution Output The final hypothesis(Strong Classifier)is: µX ¶ T Hx = sign αt Ct (x) (1) t=1
The final hypothesis H(x) is either +1 or -1 which is the PTprediction of the Strong Classifier and the value of | t=1 αt Ct (x)| provides a confidence rate for the classification. Fig. 2 shows an example of a two An advantage of boosting over other classification methods is that the discrimination threshold could be variable. In order to change the discrimination threshold to B, one could add a bias B to the final discrimination threshold. B. ROC curves A Receiver Operating Characteristic or simply ROC curve, is a graphical plot of sensitivity vs. specificity for a binary classifier as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives (hit rate) vs. the fraction of false positives
Fig. 2.
Discrimination in AdaBoost
system after occurring a specified contingency is determined by offline simulations. This set is divided randomly into two subsets called Learning Set (LS) and Training Set (TS). Usually LS size is 4 to 5 times larger than that of TS [6]. First, for simplification, only binary classification is taken into consideration, so the set of OS is classified into two categories: stable and unstable. A crisp border (like critical clearing time in transient stability) is selected to divide the set of OS into two subsets. This criterion should be crisp in order to avoid additional unnecessary complexities. Fuzzy border can be used in further researches to gain a better performance. B. Implementing AdaBoost to the problem
Fig. 3.
Various ROC curves
(false alarm). ROC analysis provides tools to select the optimal discrimination threshold with respect to the problem criteria. Fig. 3 compares the ROC curves of 3 classifiers: a random classifiers, a weak classifier and a strong classifier. Discrete classifiers, such as decision trees, yield numerical values or binary label. When a set is given to such classifiers, the result is a single point in the ROC space. The property of changeable threshold in AdaBoost makes this algorithm flexible and adjustable for different applications. The most basic theoretical property of AdaBoost concerns its ability to reduce the training error. If each weak classifier is slightly better than random, then in each round of the algorithm the training error decreases. Therefore the training error drops to zero exponentially. As described in [], although the training error drops fast, AdaBoost does not suffer from over fitting. This makes AdaBoost a desirable algorithm for two class classifications. III. A PPLYING A DA B OOST IN SECURITY ASSESSMENT PROBLEM
A. Problem definition The purpose of this paper is to solve a security assessment problem of a power system with an AL approach by means of classification. In this problem, in order to classify the set of Operating states (OS), stability of various OS of the power
As stated above, pre-classified OS are divided randomly to generate LS and TS. LS is used to train the training network. Each member of LS is composed of various entries called attributes. These attributes are electrical states of OS like voltage phasor angle and magnitude, active and reactive power flow through transmission lines, topological state of the power system network and etc. As it is shown these attributes can be of continuous time or discrete time data type. AdaBoost algorithm exploits the features out of these attributes for every member of LS. After that, AdaBoost classifies LS using the features. The training time depends on the size of LS. It is important to choose a proper size for LS to avoid undertraining and unnecessary additional training time. Training time is usually fewer than most AL methods. The method accuracy can be measured by applying TS to the training network. Classification is carried out by assigning different weights to each feature. It should be noted that these weights are valuable because they represent each feature contribution in system stability against the specified contingency.
C. Developing LS As discussed above, members of LS are OS of the power system which stability is determined after the occurring of a specified contingency with offline simulations. Mainly there are two approaches to develop LS. The first as mentioned earlier is the use of available data of previous states of the system behavior and the second is the generation of LS members by simulations. In this paper, only the latter approach is discussed [7]. Members of LS are generated by solving power flow equations based on different levels of total load and their distribution between load centers, active and reactive power generation distribution between power plants and topological state of the transmission network. After generating enough samples, load flow program is run to check the convergence of OS. Diverged ones are not applicable in the study. Usually, most of the OS (about 90%) are converged. Later on, their stability after occurring the specified contingency is determined and OS are classified into two categories: stable and unstable. This classified set is the basis of LS and TS structure.
D. Training procedure As it is shown, each member of LS has some attributes. AdaBoost exploit the features out of these attributes by processing these individuals and their various combinations. Important features usually are: Topological state of the power system network (line/bus outages) • Voltage magnitude and angle • Active and reactive power generation and demand at different buses • Active and reactive power flow through transmission lines • Angle difference between two adjacent buses • Total active load of predefined regions Many other features could also be utilized to train the training network for different problem types. [7]. It is obvious that increasing the amount of input information would increase the accuracy of the training network, however it also increases complexities and training time. So it would be desirable to use reasonable amount of information to tradeoff between accuracy and complexity and training time of the training network. •
IV. SIMULATIONS A. Problem definition The problem is to assess transient stability of a power system in case of a sever specified contingency like there phase short circuit fault on a bus bar. The system under study is 68Bus IEEE test model which has 16 generator buses, 35 load buses and 86 lines and transformers. Transient stability of this system is studied after a three phase short circuit fault is occurred inn bus 24 with clearing time of 250ms. The purpose of these simulations is to show the performance of AdaBoost algorithm abilities in TSA problems. So in this problem it is assumed that network topology is unchangeable and only base topology is considered. In base topology, all machines, lines, transformers and buses are connected to the network. These simulations were performed mainly with MATLAB R2006a software. Database generation was executed with modified PSAT toolbox version 2.1.2. Modifications were made by the authors. B. Database generation The procedure of generating database was discussed in the previous section. In order to solve this problem, a database containing 9128 stable and 6872 unstable samples (total samples of 16000) was generated. Samples are OS of the power system and were used in different manners for different purposes. From now on they would be called members. Since topology of power system network is unchangeable, members of the database were produced by varying active and reactive powers of PQ buses and active generation of PV buses. For each OS, Active power of each PQ bus was chosen as a random variable with Gaussian distribution with
Fig. 4.
column shaped diagram of AdaBoost classification
mean value of active power of bus in base case and deviation of 40% of mean value. Assuming that PF is constant during load variation, reactive power of each PQ bus is calculated from its resulted active power. The active power of a PV bus should be set differetly by setting active power of each load bus, total active load will be determined. Assuming constant participation factor for each generator, active power generation of each plant (except slack bus) would be a fraction of total load. Participation factor is computed as below. Pi λi = Pn j=1
Pj
(2)
where λi is the participation factor of generator i , Pi is its active power output in base case and n is the number of PV buses. After performing these computations for each OS, load flow program will be run in order to determine OS convergence. Then TDS will be run on converged OS after applying mentioned contingency in order to determine their stability state. Each OS with its stability state will form a member of the database. In this simulation each member of database contains 187 entries which are called features from now on. The first and second 68 features are respectively voltage angles and magnitudes of each bus. The next 35 features are active power of each PQ buses; other 16 features are 15 active power generations of PV buses plus 1 entry for stability state of the member which doesnt count as a feature. C. Simulations results The purpose of this simulation is to achieve high accuracy in TSA which can be determined by ROC curves as discussed in section II. 1) Accuracy: At first, in order to compute simulation accuracy, 10000 members of database were selected randomly as LS and the 6000 remained members were used as TS. 5869 members of LS and 3259 members of TS were stable.
Fig. 5.
ROC curves for different length of LS
By using LS to execute the training procedure, AdaBoost was able to classify TS with high accuracy. In Fig. 4, a columnshaped diagram is presented to show distribution of TS and the boundary of stability. Since most false alarms are happened in stability margin, reporting them as alarm usually dont cause any serious problems. Accuracy can also be affected by changing the size of LS. In Fig. 5, different ROC curves were drew for 3 different sizes of LS: 100, 1000 and 10000. It is obvious that increasing LS size from 1000 to 10000 have negligible influence on classification accuracy, so increasing LS size more than 10000 members is not recommended. 2) Training time: As it has stated before, increasing the size of problem by increasing number of LS members or number of features of each member would linearly increase the training time. almost linearly. Fig. 6 shows the effect of increasing number of LS members on training time. In this figure, training time of different LSs are drawn to show the linear behavior of this parameter in the problem. These results show great abilities of AdaBoost algorithm in solving TSA problems. A comprehensive discussion will be represented in the next section. V. D ISCUSSION The purpose of this section is to analysis the results and comparing them with other popular automatic learning and machine learning approaches. The following paragraphs are dedicated to these subjects. A. Results analysis 1) Accuracy: According to Fig. 5, increasing LS members would increase classification accuracy. It can be seen that in
Fig. 6.
training time per number of training samples
order to solve a problem of such size, increasing LS members to values of 10000 and higher does not affect significantly classification accuracy. Depending on type and size of problem, a LS containing 1000 up to 10000 members would be suitable. It should be noted that this proposed size of LS is only suitable for a very simple problem: one contingency with fixed network topology. Although some approaches like Global DT (GDT) can evaluate security for multiple contingencies, but their accuracy is usually less than single contingency DTs [7], [8]. AdaBoost algorithm is able to learn a large set of information: more than 10 million samples with 1000 features. But it is also possible to use different training networks for each contingency, because required memory for network and its training set is very small compared to other machine learning approaches and according to Fig. 6 training procedure performed in a little amount of time. 2) Training time: For its offline nature, Importance of training time is usually neglected in analyze of most of machine learning approaches. According to Fig. 6, increasing LS size would linearly increase training time. This unique property came from AdaBoost learning rules: finding lots of weak rules is easier and so faster than finding one strong rule. So AdaBoost algorithm, training time growth linear behavior makes it a very strong tool to solve bulk power system security problems. B. Comparison AdaBoost algorithm has many advantages over other AL and machine learning methods. More powerful classification, less training time and low probability of overtraining are its most important properties. These properties are discussed in
the following paragraphs. In most of machine learning approaches about 90 percent of used LS members were stable. So assuming all unstable members as stable would yields classification accuracy of about 90%. By using previous classifiers this accuracy increases to about 98% [6], [10]. So by reducing purity of LS, accuracy of other classifiers would reduce significantly. AdaBoost achieves high values of accuracy up tp 98.5% with LS of about 60% purity. Although it is obvious that in most cases, fault occurrence would not result in system unstability, assuming 60% purity in LS, only shows great classification power of AdaBoost algorithm. Another advantage of AdaBoost algorithm is its ability to produce ROC curves. ROC curves give operator some flexibility to choose between importance of reducing false alarm and true positive rate. This concept was never defined in any other machine learning approaches before. In most of AL methods, training time would grow exponentially by increasing size of training set. So in problems of big size they can not be used. But linear growth of training time in AdaBoost makes possible use in bulk power system security problems. Machine learning approaches i.e. DTs assess secrity locally [6]. While in AdaBoost algorithm, security assessment is performed globally. Consider all the power system network in security study will give to more reliable and plausible results. This is one of the unique attributes of AdaBoost algorithm. VI. C ONCLUSION A. conclusions AdaBoost algorithm was presented in this paper as a new machine learning approach in power system security assessment problem. A simple transient stability assessment problem was solved by AdaBoost to determine benefits of using AdaBoost algorithm as a classifier. The main benefits of using AdaBoost are as stated below. • Linear growth in training time by increasing the size of problem. • High accuracy compared to other machine learning approaches and ability to produce ROC curves. • Ability to display effects of different features in security assessment problem. • Being a very strong classifier compared to other machine learning approaches • Global security assessment, not local. These benefits of AdaBoost algorithm, makes it very useful in any security assessment problem. B. Recommendations and future works The followings are available studies for interested readers: • Applying AdaBoost in different security assessment problems like voltage stability • Using AdaBoost to determine effect of each feature type • Using AdaBoost to determine effect of distance from fault location
•
Combining AdaBoost algorithm with fuzzy logic and concepts to produce a stronger classifier. R EFERENCES
[1] L. Wehenkel, ”Machine learning approaches to power system security assessment” IEEE Expert, Vol. 12, Issue 5, Sept.-Oct. 1997, pp 60-72. [2] P. Kundor, et. Al,”Definition and classification of power system stability” IEEE transactions on power systems, Vol. 12, No. 2, May 2004. [3] M. Pavella, ”Power system transient stability assessment traditional vs. modern methods” Elsevier science Ltd, Control engineering practice 6, pp. 1233-1246, 1998. [4] L. Wehenkel and M. Pavella, ”Why and which automatic approaches to power system security assessment” CESA, Multiconference on computational engineering in systems application, July 1996. [5] M. Pavella and P.G. Murthy, ”Transient stability of power systems” John Wiley and sons, 1994. [6] L. Wehenkel and M. Pavella, ”Decision trees and transient stability of power systems” IFAC), Automatica, Vol. 27, No.1, pp. 115-134. 1991. [7] L. Wehenkel, M. Pavella, E. Euxibie and B. Heilbronn, ”Decision tree based transient stability method a case study” IEEE transaction on power systems, Vol. 9, No. 1 Feb. 1994. [8] Akella V.B., et. Al, Multicontingency decision trees for transient stability assessment, 11th Power System Computer Conference (PSCC). [9] L. Wehenkel and M. Pavella, Decision tee approach to power systems security assessment Int. journal of Electric Power and Energy Systems 15 :13-36 [10] Y. Freund, R. E. Schapire ”A Short Introduction to Boosting”. Journal of Japanese Society for Artificial Intelligence. September 1999, pp 771-780