algorithms Article
Stacked-GRU Based Power System Transient Stability Assessment Method Feilai Pan 1 , Jun Li 1 , Bendong Tan 2, *, Ciling Zeng 1 , Xinfan Jiang 1 , Li Liu 1 and Jun Yang 2 1
2
*
State Grid Hunan Electric Power Company, Changsha 410000, China;
[email protected] (F.P.);
[email protected] (J.L.);
[email protected] (C.Z.);
[email protected] (X.J.);
[email protected] (L.L.) School of Electrical Engineering, Wuhan University, Wuhan 430000, China;
[email protected] Correspondence:
[email protected]; Tel.: +86-27-68776346
Received: 24 July 2018; Accepted: 7 August 2018; Published: 9 August 2018
Abstract: With the interconnection between large power grids, the issue of security and stability has become increasingly prominent. At present, data-driven power system adaptive transient stability assessment methods have achieved excellent performances by balancing speed and accuracy, but the complicated construction and parameters are difficult to obtain. This paper proposes a stacked-GRU (Gated Recurrent Unit)-based transient stability intelligent assessment method, which builds a stacked-GRU model based on time-dependent parameter sharing and spatial stacking. By using the time series data after power system failure, the offline training is performed to obtain the optimal parameters of stacked-GRU. When the application is online, it is assessed by framework of confidence. Basing on New England power system, the performance of proposed adaptive transient stability assessment method is investigated. Simulation results show that the proposed model realizes reliable and accurate assessment of transient stability and it has the advantages of short assessment time with less complex model structure to leave time for emergency control. Keywords: data-driven; adaptive transient stability assessment; stacked-GRU; time series; intelligent assessment system
1. Introduction With the development of the economy, load demand is increasing, and the power system is getting closer to the transmission capacity limit. In addition, the pace of nationwide networking is gradually advancing, the security and stability characteristics of the power grid are increasingly complex, and the risk to get an unstable power system after a fault is higher and higher [1], so quickly and accurately determining the stability of the power system is an urgent problem to be solved to allow for the safe operation of the power grid. At present, the time domain simulation [2], energy function method [3,4], machine learning method [5,6], and dynamic security domain [7] are studied in the research of power system transient stability assessment (TSA) methods. The time domain simulation method and the energy function method still have great limitations after many years of development and have not effectively solved the TSA problem; with the breakthrough of machine learning in image processing, speech processing, and other fields [8], TSA based on data-driven methods has begun to attract the attention of many scholars. The use of machine learning methods for TSA is also known as an intelligent system (IS) [9–11], IS can find the relationship between power system features and stability; IS has the advantages of fast assessment speed, strong generalization ability, and ability to estimate dynamic security domains for control decisions compared with the time domain simulation and energy functional methods [9].
Algorithms 2018, 11, 121; doi:10.3390/a11080121
www.mdpi.com/journal/algorithms
Algorithms 2018, 11, x FOR PEER REVIEW
2 of 10
dynamic security domains for control decisions compared with the time domain simulation and 2 of 10 energy functional methods [9]. In order to achieve rapid TSA, a large number of machine learning algorithms such as the BP neural [12], support [13,14],ofdecision [15], random forest [16], In network order to achieve rapid vector TSA, amachine large number machinetree learning algorithms such as and the K-nearest neighbor[12], [17]support are applied inmachine intelligent systems, suchtree algorithms mainly use[16], the and key BP neural network vector [13,14], decision [15], random forest features ofneighbor the power after feature selection to establish functionalmainly relationship with K-nearest [17]system are applied in intelligent systems, such algorithms use the keytransient features stability. Once the fault is feature cleared,selection the transient stabilityfunctional assessment can be performed, but there is no of the power system after to establish relationship with transient stability. guarantee of 100% accuracy of assessment. assessment method [9,18] has Once the fault is cleared, the transient stability Therefore, assessmentan canadaptive be performed, but there is no guarantee emerged in recentof years. By determining the an time window of the fixed length [9,18] after the m data of 100% accuracy assessment. Therefore, adaptive assessment method hasfault, emerged in sampling points in the timethewindow is used for fixed training; assessment is data performed in recent years. By determining time window of the lengththe after the fault, m sampling chronological order until the assessment confidence meets the requirements. Such approaches points in the time window is used for training; the assessment is performed in chronological order guarantee the correctness of assessment, the assessment is longguarantee and the construction of the until the assessment confidence meets the but requirements. Such time approaches the correctness of model is very m time dataissampling means thatofmthe classifiers be trained. assessment, butcumbersome: the assessment long andpoint the construction model isneed very to cumbersome: Although the authors a past that paper [18] considered influence of historical information on m data sampling pointofmeans m have classifiers need to bethe trained. Although the authors of a past the current timeconsidered assessmentthe in the time series, it reduces the assessment to atime certain extent, but paper have [18] influence of historical information on the time current assessment in stilltime needs to train a largethe number of classifiers. the series, it reduces assessment time to a certain extent, but still needs to train a large number Aiming at solving the problems existing in the above adaptive assessment method, the main of classifiers. contribution paper to proposes a stacked-GRU based intelligent TSA model, which realizes Aiming of at this solving theisproblems existing in the above adaptive assessment method, the main the parameter sharing at each moment through stacked-GRU and utilizes the time series data after contribution of this paper is to proposes a stacked-GRU based intelligent TSA model, which realizes the parameter power system train with lessstacked-GRU complex model structure. The series offline training is the sharingfailure at eachtomoment through and utilizes the time data after the performed to obtain parameters. When the application is online, is judgediswhether the power system failurethe to optimal train with less complex model structure. The offlineit training performed current satisfiesWhen the confidence levelisatonline, the time fault whether clearing,the otherwise to obtainassessment the optimalresult parameters. the application it is of judged current measurements will continue be provided to at the next moment assessment until measurements the confidence assessment result satisfies thetoconfidence level the time of faultfor clearing, otherwise levelcontinue is satisfied. will to be provided to the next moment for assessment until the confidence level is satisfied. On the the one one hand, hand, itit can can perform perform TSA TSA in in the the current current moment, moment, and and on on the the other other hand, hand, can can provide provide On historical information information for for the the assessment assessment at at the next moment moment when when the the assessment assessment result result at at the the current current historical moment is unreliable, so there is no need to train multiple classifiers, and the problem of cumbersome moment is unreliable, so there is no need to train multiple classifiers, problem of cumbersome training is is solved. solved. In addition, the method draws draws on on the the idea idea of of deep deep learning learning [19] [19] to to use use “stacking” “stacking” to to training learn representational features from learn from the the input input can canfurther furtherimprove improvethe theaccuracy accuracyofofthe theclassifier. classifier. The remainder remainder of of the the paper paper is is organized organized as as follows. follows. Section 2 introduces the structure of of the the The proposed stacked-GRU. stacked-GRU. Section 3 proposes the transient transient stability intelligent assessment method. proposed Simulations and analysis are carried out in Section 4. Finally, Finally,aaconclusion conclusionisisdrawn drawnin inSection Section5.5. Simulations and analysis are carried out in Section 4. Algorithms 2018, 11, 121
2. 2. Methodology 2.1. 2.1. Gated Gated Recurrent Recurrent Unit Unit As 1 [20], GRU (Gated Recurrent Unit)Unit) [21] is[21] a kind recurrent neural network As shown shownininFigure Figure 1 [20], GRU (Gated Recurrent is aof kind of recurrent neural (RNN) used to solve the problem of gradient explosion and gradient disappearance [22]. Traditional network (RNN) used to solve the problem of gradient explosion and gradient disappearance [22]. neural networks such as the BP neural network [23] and convolutional neural networks [19] are not Traditional neural networks such as the BP neural network [23] and convolutional neural networks good at processing time series information, and GRU, as a variant of RNN, can combine historical [19] are not good at processing time series information, and GRU, as a variant of RNN, can combine information with current to predict so it has so been widely in used speech historical information withtime current time tofurther predictinformation, further information, it has beenused widely in recognition, language modeling, translation, picture description, and other issues [24]. speech recognition, language modeling, translation, picture description, and other issues [24].
ht
ht −1 1-
rt zt σ σ
ht tanh
xt Figure 1. 1. Gated Recurrent Recurrent Unit. Unit. Figure
Algorithms 2018, 11, 121
3 of 10
The forward propagation of the GRU unit is shown in the Equations (1)–(5), where “·” is matrix multiplication and “ ” represents matrix wise element multiplication: The GRU’s update gate zt and reset gate rt are used to control the direction of the data stream at time t. The calculation method is as follows: zt = σ (Wz ·[ht−1 , xt ]
(1)
rt = σ (Wr ·[ht−1 , xt ])
(2)
where xt is the input at time t, Wz is the weight of update gate, Wr is the weight of reset gate, and ht−1 is the output of hidden layer. et which is the output of candidate hidden layer at the time t, the information When calculating h of the historical time which means the output of hidden layer is retained, and the historical time information is controlled by adjusting the value of rt , as shown by: et = tanh(W·[rt ht−1 , xt ] h x
(3)
−x
e where tan h(x) = eex − +e− x . Finally, by using zt to control the information of the hidden layer that how much is forgotten and the how much information in the candidate hidden layer is retained, the final calculated output of hidden layer is as shown in the equation:
et ht = zt ht −1 + (1 − zt ) h
(4)
yelast = σ (Wo hlast + bo )
(5)
Then the output of GRU is: where yelast is the output of last output layer in the time series, Wo is weight of output layer, and bo is bias of output layer. If the reset gate is close to 0, then the output of hidden layer at the moment will not be preserved, and the GRU unit will discard some historical information unrelated to the future. The update gate determines how much information the hidden layer at time t − 1 needs to be saved in the output of hidden layer at time t. If the element is close to 1, the corresponding information of the hidden layer at the moment will be copied to the time t. At this time, the update gate unit with long distance dependence is active, therefore long-distance information can be learned; if the element is close to 0, it is equivalent to the standard RNN, and the short-distance information can be processed. At this time, the update gate unit with short-distance dependence is active. Under this mechanism, historical information can flexibly help predict future information. When training the model, the GRU uses the backpropagation through time (BPTT) [25] for network optimization. The target loss function to optimize is: N
L=
∑ y(i) ln(yelast ) (i )
(6)
i
(i )
In the formula, y(i) is the real label of i-th sample, and yelast is predicted label at the last moment of i-th sample, and N is the total number of samples. 2.2. Stacked-GRU GRU is a shallow model with weak capability of feature extraction, and the stacked-GRU is composed of several GRU units, as shown in Figure 2. Specifically, the input of first layer in stacked-GRU is the original data, and the formulas is the same as the GRU unit in Section 2.1.
Algorithms 2018, 11, 121
4 of 10
Algorithms 2018, 11, x FOR PEER REVIEW
Output sequence
h0n
4 of 10 y1
y 2
yT
σ
σ
σ
GRU
h1n
GRU
•••
hTn -1
h12
h11
Input sequence x 1
GRU
GRU
x2
GRU
•••
h01
GRU
h2n
•••
••• h02
GRU
h22
h21
•••
•••
•••
hT2−1
hT1 −1
GRU
GRU
xT
Figure 2. The structure of stacked-GRU.
Figure 2. The structure of stacked-GRU. The input of each GRU unit in the middle is the output of the hidden layer of the upper layer GRU
Theunit: input of each GRU unit in the middle is the output of the hidden layer of the upper layer = ·h , GRU unit: i i ·· hi ,, hi −1 i == z σ W t (7) z( h t· −1 ⊙t i, ) ri = σ= tanh i i −1 i W · h 1 , ht t = ⊙r t−+ (7) h (1 − ) ⊙ i i et = tan h Wi · ri hi , hi−1 h t−1unit, t GRU Among them, the superscript the subscript represents the i representsi the i-th t iand −1 of sigmoid as the classifier to i hin GRU i moment t. The hidden layer of the lastzlayer unit adds a layer ht = + 1 − z h t −1 t t t provide output:
Among them, the superscript represents the i-th GRU unit, and the subscript represents the (8) = ( + ) moment t. The hidden layer of the last layer in GRU unit adds a layer of sigmoid as the classifier to () is predicted label at the last moment of i-th sample, is weight of In the formula, provide output: output layer, and is the bias of the n-th GRU unit. training method of stacked-GRU and the n n n The yelastas=a σsingle ho + The bo ) objective function is: (8) (Wo GRU. optimized objective function are the same (i )
In the formula, yelast is predicted label at the last moment of i-th sample, Wno is weight of output () () (9) ( ) = n layer, and bo is the bias of the n-th GRU unit. The training method of stacked-GRU and the optimized objective function are the same as a single GRU. The objective function is: On the one hand, such a deep structure can efficiently discover high-level feature from limited data. On the other hand, it can more fully utilize information in the time series, which can N effectively improve the performance of the classifier. (i ) In addition, the stacked-GRU model L= y(i) ln(yelast ) parameters are independent of time, so the trade-off between time and precision in reference [18] i can be avoided.
∑
(9)
On the one hand, such a deep structure can efficiently discover high-level feature from limited data. 3. Transient Stability Intelligent Assessment Method On the other hand, it can more fully utilize information in the time series, which can effectively improve 3.1. Offlineof Training the performance the classifier. In addition, the stacked-GRU model parameters are independent of time, so the trade-off time and precision reference active [18] can be of avoided. In order tobetween train model effectively, the angle in of generators, power generators, reactive output of generators, the bus amplitude, and phase, active, and reactive of line, active and reactive
3. Transient Stability Intelligent Assessment Method of load are selected as input features. The total dimension of the input features is 368. The feature vectors are denoted as
, it represents t-th sampling point in the time series after fault-clearing,
3.1. Offline Training In order to train model effectively, the angle of generators, active power of generators, reactive output of generators, the bus amplitude, and phase, active, and reactive of line, active and reactive of load are selected as input features. The total dimension of the input features is 368. The feature vectors are denoted as xt , it represents t-th sampling point in the time series after fault-clearing, then the input of model can be denoted as X = [x1 , x2 , · · · , xt , · · · , xT ], T is the length of timing observation window selected after the fault is cleared.
Algorithms 2018, 11, 121
5 of 10
In offline training, time domain simulation is used to generate a fixed time length fault sequence data set under various operating conditions, and the labels are set as a corresponding stable state, and the stable state is defined as: ( Algorithms 2018, 11, x FOR PEER REVIEW 5 of 10 1 η