Neural-Network-Based Fuzzy Model And Its Application to Transient ...

17 downloads 0 Views 247KB Size Report
Mar 9, 2009 - [6] F. Uebele, S. Abe, and M.-S. Lan, “A neural network-based fuzzy classifier,” IEEE Trans. Syst. ... Mu-Chun Su, Chih-Wen Liu, and Shuenn-Shing Tsay .... supervised decision-directed learning (SDDL) algorithm to train a class of ...... monitoring stations by using adequate mathematical models. Moreover ...
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 1, FEBRUARY 1999

REFERENCES [1] S. Abe, Neural Networks and Fuzzy Systems: Theory and Applications. Boston, MA: Kluwer, 1996. [2] M. T. Musavi, W. Ahmed, K. H. Chan, K. B. Faris, and D. M. Hummels, “On the training of radial basis function classifiers,” Neural Networks, vol. 5, no. 4, pp. 595–603, 1992. [3] S. Abe and R. Thawonmas, “A fuzzy classifier with ellipsoidal regions,” IEEE Trans. Fuzzy Syst., vol. 5, pp. 358–368, Mar. 1997. [4] P. K. Simpson, “Fuzzy min-max neural networks—Part 1: Classification,” IEEE Trans. Neural Networks, vol. 3, pp. 776–786, Sept. 1992. [5] S. Abe and M.-S. Lan, “A method for fuzzy rules extraction directly from numerical data and its application to pattern classification,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 18–28, Jan. 1995. [6] F. Uebele, S. Abe, and M.-S. Lan, “A neural network-based fuzzy classifier,” IEEE Trans. Syst., Man, Cybern., vol. 25, pp. 353–361, Mar. 1995. [7] A. Kawato and S. Hayashi, “A method of ball-bearing diagnosis by neural network using the normal mode data,” Trans. Inst. Elect. Eng. Japan, vol. 115-C, no. 11, pp. 1362–1368, 1995 (in Japanese). [8] M. P. Windham, “Cluster validity for the fuzzy c-means clustering algorithm,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-4, pp. 357–363, Apr. 1982. [9] R. Krishnapuram and J. M. Keller, “A possibilistic approach to clustering,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 98–110, Feb. 1993. [10] J. C. Bezdek and R. J. Hathaway, “Convergence theory for fuzzy cmeans: Counterexamples and repairs,” IEEE Trans. Syst., Man, Cybern., vol. SMC-17, pp. 873–877, Sept. 1987. [11] T. Kohonen, Self-Organization and Associative Memory, 2nd ed. Berlin, Germany: Springer-Verlag, 1987. [12] R. Fisher, “The use of multiple measurements in taxonomic problems,” Ann. Eugenics, vol. 7, part II, pp. 179–188, 1936. [13] A. Hashizume, J. Motoike, and R. Yabe, “Fully automated blood cell differential system and its application,” in Proc. IUPAC 3rd Int. Congress Automat. New Technol. Clinical Lab., Sept. 1988, pp. 297–302. [14] S. M. Weiss and I. Kapouleas, “An empirical comparison of pattern recognition, neural nets, and machine learning classification methods,” in Proc. Int. Joint Conf. Neural Networks, pp. 781–787, 1989.

149

Neural-Network-Based Fuzzy Model and Its Application to Transient Stability Prediction in Power Systems Mu-Chun Su, Chih-Wen Liu, and Shuenn-Shing Tsay

Abstract—This paper presents a general approach to deriving a new type of neural-network-based fuzzy model for a complex system from numerical and/or linguistic information. To efficiently identify the structure and the parameters of the new fuzzy model, we first partition the output space instead of the input space. As a result, the input space itself induces corresponding partitions within each of which inputs would have similar outputs. Then we use a set of hyperrectangles to fit the partitions of the input space. Consequently, the premise of an implication in the new type of fuzzy rule is represented by a hyperrectangle and the consequence is represented by a fuzzy singleton. A novel two-layer fuzzy hyperrectangular composite neural network (FHRCNN) can be shown to be computationally equivalent to such a special fuzzy model. The process of presenting input data to each hidden node in a FHRCNN is equivalent to firing a fuzzy rule. An efficient learning algorithm was developed to adjust the weights of an FHRCNN. Finally, we apply FHRCNN’s to provide real-time transient stability prediction for use with high-speed control in power systems. From simulation tests on the IEEE 39-bus system, it reveals that the proposed novel FHRCNN can yield a much better performance than that of conventional multilayer perceptrons (MLP’s) in terms of computational burden and classification rate. Index Terms—Fuzzy systems, neural networks, transient stability prediction.

I. INTRODUCTION Neural networks and fuzzy systems have attracted the growing interest of researchers in various disciplines of engineering and science. Their applications range widely from consumer products to decision analysis. Basically, a neural network is a massively paralleldistributed processor. Among the many appealing properties of a neural network, the property that is of primary significance is the ability of the neural network to inductively learn concepts from given numerical data. A neural network improves its performance by adjusting its synaptic weights. Feedforward neural networks [e.g., multilayer perceptrons (MLP’s)] have been proven to be able to approximate any real continuous function on a compact set to arbitrary accuracy [1]–[3]. Therefore, a feedforward neural network is an efficient tool for system modeling and identification, however, there are three major disadvantages in a feedforward neural network. The first one is that there is no systematic way to set up the topology of a neural network. The second one is that it usually takes a lot of time to train a neural network. The third and the most apparent one is that a trained neural network is unable to explain its response (i.e., the inference process cannot be stated explicitly). Therefore, even if we can finally model a complex system by a trained neural network, the knowledge encoded in the values of the parameters of the trained neural network is not physically meaningful to humans when they depend on appropriate and understandable information to make decisions. Accordingly, how to acquire a relevant and meaningful system description from observed data or experience is very much demanded. Manuscipt received January 6, 1997; revised December 18, 1997 and June 2, 1998. M.-C. Su is with the Department of Electrical Engineering, Tamkang University, Tamsui, 25137 Taiwan, R.O.C. (e-mail: [email protected]). C.-W. Liu and S.-S. Tsay are with the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, R.O.C. Publisher Item Identifier S 1094-6977(99)00095-4.

1094–6977/99$10.00  1999 IEEE

Authorized licensed use limited to: National Taiwan University. Downloaded on March 9, 2009 at 04:08 from IEEE Xplore. Restrictions apply.

150

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 1, FEBRUARY 1999

Fuzzy modeling provides an appealing solution to this problem because the fuzzy approach is in a sense matched to human reasoning or decision making. In addition, it is a modeling tool that has simplicity and generality. Fuzzy modeling is based on fuzzy implications and inference. In general, the fundamental idea behind fuzzy modeling is to describe a system by establishing the fuzzy input–output relation that may be usually expressed in terms of fuzzy if–then rules. Each fuzzy rule maps a fuzzy partition of the input space into another fuzzy partition of the output space. Then fuzzy inference mechanism proceeds through the rules and follows by interpolation. Consequents from different rules are numerically combined to provide appropriate outputs to inputs. This process is referred to as “defuzzification.” During the last two decades, a number of different approaches to fuzzy modeling have been proposed. Basically, there are three distinct classes of fuzzy models. They are linguistic fuzzy models introduced by Zadeh [4], fuzzy linear models introduced by Takagi and Sugeno [5], and fuzzy relational models [6], [7]. A unifying view to these three models was given in [8]. Each has its own advantages and disadvantages. In this paper, we discuss a new method of fuzzy modeling based on a class of fuzzy hyperrectangular composite neural networks (FHRCNN’s). Basically, we can build a fuzzy model of a system without a priori knowledge about the system provided that numerical input–ouput data are available. However, linguistic information, if it ever exists, may accelerate the building procedure. In our previous work [9], [10], we developed a method for extracting crisp if–then rules directly from numerical input–output data for pattern recognition. The method is based on applying the supervised decision-directed learning (SDDL) algorithm to train a class of hyperrectangular composite neural networks (HRCNN’s). Each generated hyperrectangle that shows the existence region of data corresponds to a crisp rule. Basically, FHRCNN’s are a fuzzified version of HRCNN’s [11]–[12]. In an FHRCNN, the synaptic weights of a hidden node define a hyperrectangle, which then corresponds to a fuzzy rule. As weighted outputs of hidden nodes propagate to an output node, the defuzzification mechanism proceeds to provide a crisp output. For developing a fuzzy model of a system, the number of hidden nodes and the synaptic weights have to be identified. The method is as follows. First, we divide the range of an output variable into multiple intervals, and each interval represents a class. Then we use the SDDL algorithm to generate hyperrectangles to classify these quantized input–output patterns. The first several most representative hyperrectangles are selected to construct a fuzzy rule base. Finally, we apply the so-called backpropagation algorithm or the least mean square (LMS) algorithm to finetune the FHRCNN so as to construct a fuzzy model. The problem of transient stability prediction is examined to check the effectiveness of the FHRCNN’s. With the advent of phasor measurement units (PMU’s) that are capable of making real-time phasor measurements, the real-time assessment of the stability of a transient swing after a severe disturbance in power systems has become an important area of research. An application of this technique is to determine if an evolving swing is to be stable or unstable and then to select an appropriate remedial action control strategy. Several approaches for solving real-time transient stability assessment problems using PMU’s have been proposed [13]–[17]. Many transient stability assessment techniques, such as in [27], involve solving the model equations; therefore, they are too complicated and computationally inefficient for real-time use. In this paper, the FHRCNN is applied for real-time transient stability prediction using PMU’s. This neurofuzzy approach can learn offline from a training set and is used online to predict future behavior of new data much faster than would be possible by solving the model analytically. The structure of this paper is organized into five sections. Section II gives a brief review of fuzzy

models and then states the motivation for developing the new type of fuzzy model based on FHRCNN’s. In Section III, we introduce the architecture and characteristics of FHRCNN’s. In Section IV, we apply our fuzzy neural network to a real-time transient stability prediction problem. Conclusions are given in Section V. II. CLASSIFICATION

OF

FUZZY MODELS

In this section, we first briefly describe the three most popular types of fuzzy models and then compare them regarding structural dependence, which assumes to exist between the model inputs and outputs. Secondly, we state the motivation of proposing a new fuzzy model based on FHRCNN’s. A. Linguistic Fuzzy Models A typical linguistic rule is of the following form:

R(j ) : If x1 is Aj1 and 1 1 1 and xn is Ajn then y is B j (1) where Aj and B j are linguistic variables, x = (x1 ; . . . ; xn )T 2 n and y 2 V  R are crisp or linguistic input and output U  R variables of the j th rule, respectively, and j = 1; 2; . . . ; J . Both the rule antecedents and consequents are defined by means of fuzzy sets that are characterized by membership functions A and B , respectively. Generally, the performance of the fuzzy model depends on the kind of membership function, type of fuzzifier, fuzzy logic operator, types of fuzzy inference, and defuzzification procedure. The reason is that different combinations of operators, defuzzifiers, and fuzzifiers would result in different crisp outputs. Therefore, how to select an appropriate combination is a big challenge. Some criteria (e.g., computation efficiency and easy for adaptation) are suggested by Wang [18]. There are two ways of obtaining linguistic rules. The first and the most straightforward way is to ask human experts. These given linguistic rules represent the policies and heuristic strategies of the corresponding decision-making experts. The membership functions for each of the fuzzy subsets appearing in the linguistic rules should also be specified by the experts at the same time. In most cases, the initial rule base constructed by the given linguistic rules and the corresponding membership functions are too crude for engineering purposes. While linguistic rules allow a fast development of a fuzzy model, the performance of the fuzzy model is closely influenced by the rules and closeness and correctness of the membership functions to the true reflection of the linguistic values expressed in the rules. The second way to obtain linguistic rules is to use training algorithms based on numerical data. It usually involves partitioning the premise and consequence space and establishing the mapping from the premise space to the consequence space. Then, some training algorithms are utilized to adjust the structures and the parameters of the fuzzy model based on numerical information. Basically, this method results in grid fuzzy partitions. Fig. 1 illustrates an example of grid fuzzy partitions. If input variables are highly correlated with one another, the number of partitions increases to capture the correlations. The authors in [19] proposed a simple training algorithm that just performs a one-pass operation on the training data. The key ideas of the approach are to generate fuzzy rules from numerical data pairs and collect these fuzzy rules and the linguistic rules given by human experts into the final fuzzy model. Although it requires much less construction time, the price paid for this simplicity is that we have to determine the partitions of the domain intervals and the membership functions in an ad hoc manner. Instead of taking fuzzy grid partitions, the authors in [20] introduced the fuzzy c-means (FCM) method to generate fuzzy rules from numerical data. They first made clustering of the output data and then induced fuzzy clusters in the input

Authorized licensed use limited to: National Taiwan University. Downloaded on March 9, 2009 at 04:08 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 1, FEBRUARY 1999

151

Fig. 2. Example of general fuzzy partitions.

Fig. 1. Example of grid fuzzy partitions.

space. The advantage of this approach is that unnecessary grid fuzzy partitions can be avoided. However, the computational requirements for clustering data, deciding the appropriate number of clusters and shaping the membership functions, may be very intense. B. Takagi–Sugeno Fuzzy Models Instead of considering the linguistic rules in the form of (1), Takagi and Sugeno [5] extended the linguistic rules to rules with consequents in the form of linear functions of antecedent variables, for instance (j ) j j R : If x1 is A1 and ; . . . ; and xn is An j j j j then y = c0 + c1 x1 + 1 1 1 + cn xn (2) j where Ai are fuzzy sets, x = (x1 ; x2 ; . . . ; xn )T  Rn is a crisp input vector, cij are real-valued parameters, y j is the crisp system output due to rule R(j ) , and j = 1; 2; . . . ; J . Each rule represents a locally linearized model. The advantage of this fuzzy linear model is that the parameters cij of the model can be easily identified from numerical data. A weak point of the model is that the interpretation of the fuzzy linear rules is difficult compared to linguistic rules. Therefore, it is difficult for us to incorporate the linguistic information from human experts with the numerical information from experimental data to construct a Takagi–Sugeno (TS) fuzzy model. In addition, another main disadvantage of this model using trapezoidal membership functions is that the local linearization characteristic may deteriorate the approximation accuracy when the linear model is adopted to identify a highly nonlinear system. On the other hand, if we use Gaussian membership functions and product operators, the output of a zero-order (i.e., y j = c0j ) TS fuzzy model is functionally equivalent to a radial basis function (RBF) network [21]. C. Singleton Fuzzy Models After these discussions about the linguistic fuzzy models and the TS fuzzy models, it seems that all we need is the following fuzzy rule, whose consequence part is a fuzzy singleton: (j ) j j j R : If x1 is A1 and ; . . . ; and xn is An then y is c0 : (3) One approach to identifying this kind of fuzzy model is to represent the fuzzy model as a three-layer feedforward neural network and then to use the backpropagation algorithm to adjust the parameters of the fuzzy model [22]. To overcome the disadvantages of the backpropagation algorithm (e.g., it may be trapped at a local minimum or converge very slowly), the author proposed to fix some parameters in the fuzzy model so that the fuzzy model can be represented as a linear combination of the so-called fuzzy basis functions (e.g., Gaussian functions). Then, the orthogonal least-squares algorithm is utilized

to select the significant fuzzy basis functions and the corresponding optimal coefficients. Although these two training algorithms perform successfully, they still require substantial computations for complex problems. D. Choosing the Right Model After we have described the three popular fuzzy models, we may summarize their some important properties, e.g., model complexity, linguistic interpretation, and construction from data. Linguistic models have a good linguistic interpretation but are less efficient (more rules are needed) and require more informative data for identification. TS fuzzy models provide good identification properties (the consequent parameters can be easily estimated), however, it is difficult to incorporate the linguistic information with the numerical information to construct a TS fuzzy linear model. To a certain degree, singleton fuzzy models inherit the good identification properties from TS models and posses the advantages of better interpolation of the linguistic models. Although the consequent parts of the three models are different, their antecedents are the same. They employ grid partitions; therefore, they would encounter problems when we have a moderately large number of inputs. For instance, a fuzzy model with ten inputs and two membership functions on each input would result in 210 = 1024 fuzzy rules, which is prohibitively large. In addition, Takagi and Hayashi [23] stated that fuzzy grid partitions cannot capture the correlations between input variables. In order to overcome this weak point, the input space should be partitioned appropriately to reflect the correlations. Therefore, it is better to use the following fuzzy rules of the type: j then y is cj (j ) (4) R : If x is A 0

where Aj is a multidimensional fuzzy set defining an arbitrarily shaped region. The general fuzzy partitions offer the capability of capturing the correlations of input variables and limit the number of rules to a reasonable number. Fig. 2 illustrates such an example of general fuzzy partitions. Accordingly, the complexity of defining this kind of multidimensional fuzzy sets increases. Takagi and Hayashi [23] proposed to train feedforward neural networks to implement general fuzzy partitions. The price is that the synaptic weights of the trained networks have no clear physical meaning. In order to make a satisfactory tradeoff between the capability and the complexity, one option is to use an aggregation of hyperrectangles to fit a general fuzzy region. By doing this, we are able to transform the rule expressed in (4) into the following rule: j j (j ) then y is c0j (5) R : If x is H R1 [ 1 1 1 [ H Rk where H Rkj represents a fuzzy set defining an n-dimensional hyperrectangle. A major distinction between the two rules expressed in (3)

Authorized licensed use limited to: National Taiwan University. Downloaded on March 9, 2009 at 04:08 from IEEE Xplore. Restrictions apply.

152

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 1, FEBRUARY 1999

(a)

(b)

Fig. 4. (a) Symbolic representation of a two-layer HRCNN, and (b) symbolic representation of a two-layer FHRCNN.

Fig. 3. Grid partitions versus general partitions. The dash lines represent grid partitions. The solid lines represent general partitions, and the -shaped region represents the region to be approximated.



and. (5) is illustrated in Fig. 3. It is apparent that, by using the latter type of rules, we can omit several unnecessary grid partitions so as to greatly decrease the number of fuzzy rules. For the case shown in Fig. 3, we need only three large rectangles to fit the -typed region instead of using nine smaller rectangles. In this new type of fuzzy models, the structure identification is to decide the number of fuzzy hyperrectangles. The parameter identification is performed to estimate the parameters defining the corresponding hyperrectangles (e.g., sizes and locations) and the coefficients in the consequent parts. Based on these discussions, we claim that the new fuzzy model is more efficient than singleton fuzzy models but both models share the same merits. The price that the new type of fuzzy models has to pay is that an efficient algorithm has to be developed to complete the structure identification and the parameter identification. To efficiently identify such a fuzzy model, we propose to train a two-layer FHRCNN to construct it. The architecture and characteristics of FHRCNN’s are explained in the next section. III. FUZZY NEURAL NETWORKS The class of FHRCNN’s is a fuzzified version of the class of HRCNN’s. From our previous work [9], [10], we have shown that the values of the synaptic weights of a trained HRCNN can be interpreted as a set of crisp if–then rules. Fig. 4(a) illustrates a symbolic representation of a two-layer HRCNN. The mathematical description of a two-layer HRCNN is given as follows: Out(x) = f

J j =1

Outj (x) 0 

Outj (x) = f (netj (x))

(6) (7)

n

netj (x) = f ((Mji 0 xi )(xi 0 mji )) 0 n ¯ i=1

(8)

where

f (u) =

1; 0;

if u  0 otherwise:

(9)

Mji and mji 2 < are adjustable synaptic weights of the j th hidden node with the property Mji  mji ; x = (x1 ; x2 ; . . . ; xn )T is an input pattern,  is a positive constant less than one, Outj (x) is the output function of the j th hidden node, and Out(x): Mji (t) or 12 (mji (t) 0 xi ) if xi < mji (t). A two-layer FHRCNN, as shown in Fig. 4(b), employs a special membership function mj (x) instead of a hard limiter function f (x) defined in (9) as the output function of each hidden node. The membership function mj (x) measures the degree to which an input pattern is close to the hyperrectangle defined by [mj 1 ; Mj 1 ] 2 1 1 1 2 [mjn ; Mjn ]. A mathematical representation of a two-layer FHRCNN is of the form J Out(x) =

j =1

wj mj (x) + 

0sj2[volj (x) 0 volj ]2 (Mji 0 mji )

mj (x) = exp n volj =

i=1

(14) (15) (16)

and n volj (x) =

i=1

max(Mji

0 mji ; xi 0 mji ; Mji 0 xi )

(17)

where wj is the connection weight from the j th hidden node to the output node, sj is the sensitivity factor that regulates the membership value, and  is a bias term that is adjustable. Apparently, the output function of a two-layer FHRCNN is a linear weighted combination of J local functions. From (15)–(17), it is easy to find that the function mj (x) provides us with more flexibility than the Gaussian function does because the former one can be either a step-like function or a Gaussian-like function, as is shown in Fig. 7(a) and (b). Based on this comparison, we may conclude that a two-layer FHRCNN is a universal function approximator whose efficiency is better than that of an RBF network adopting the Gaussian functions.

Basically, a two-layer FHRCNN can be either trained by the backpropagation algorithm [24], [25] or the real-valued genetic algorithm [26]. No matter which method of the training procedure is adopted, if we start from a good initial point, the convergence of the training procedure will be greatly accelerated. In order to give a satisfactory solution to the initialization problem, we have to make full use of both the linguistic information from human experts and the numerical information from experimental data. As the architecture of a two-layer FHRCNN provides us with a convenient framework to incorporate human experts’ knowledge, we are able to find a good guess of the values of Mji ; mji , and wj based on given linguistic rules. As for the numerical information, the initial weights can be estimated through transform of function approximation into a pattern recognition problem. The whole training procedure is addressed in the next section. IV. IDENTIFICATION OF THE NEURAL-NETWORK-BASED FUZZY MODELS The hybrid learning algorithm developed to train a two-layer FHRCNN consists of the following steps. Note that here we focus on multi-input/single-output systems since a large multi-input/multioutput system can be broken into several small multi-input/singleoutput subsystems. A. Step 1: Partition the Output Space into Multiple Intervals If the problems to be considered are pattern recognition problems instead of function approximation problems, we skip this step and go directly to Step 2. Here we first have to determine the number of partitions. The greater the number of partitions, the higher the approximation accuracy. Actually, no definite theoretical guide for this choice is available. One way of doing this is to use any simple clustering algorithm that provides the estimation of the number of clusters from the output data set. For instance, we may use the output data to form a one-dimensional (1-D) histogram and then to identify significant peaks in the histogram. The number of significant peaks would provide a good estimate of the number of the partitions. After the number of partitions has been set (say, k), we may uniformly partition the output space into k intervals or use the k-means algorithm to find a more appropriate partition. By doing this, the original input–output pairs are then transformed into quantized input–output pairs. B. Step 2: Transform a Function Approximation Problem into a Pattern Recognition Problem Since the outputs have been labeled into one of the k classes, we use the SDDL algorithm to generate k two-layer HRCNN’s to classify these quantized input–output patterns. Assume that hic hidden nodes

Authorized licensed use limited to: National Taiwan University. Downloaded on March 9, 2009 at 04:08 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 1, FEBRUARY 1999

155

were generated in the ith trained HRCNN for i = 1; 2; . . . ; k. For each trained HRCNN, corresponding hidden nodes are then ranked in the order of the number of patterns contained in the corresponding hidden nodes (hyperrectangles). C. Step 3: Initialize and Update Weights In order to make a satisfactory compromise between the approximation accuracy and the number of hidden nodes of the FHRCNN to be trained, we select the first ranked ni hidden nodes out of the hic nodes from each of the trained HRCNN’s for i = 1; 2; . . . ; k to initialize the weights (Mji and mji ) of the FHRCNN. Therefore, the number of hidden nodes of the FHRCNN to be trained is set up to be n1 + n2 + 1 1 1 + nk . The following work is to initialize the connection weights wj for j = 1; 2; . . . ; n1 + 1 1 1 + nk . The idea is that it is reasonable to expect that input patterns whose corresponding outputs were labeled the same in the first step would result in similar outputs. Therefore, we initialize wj to be the mean value of a corresponding interval. After we have initialized all adjustable weights, we have to finetune them such that the squared error between the system output and the output of the FHRCNN is minimized. Here we have two options to deal with finetuning. The first option is to use the realvalued genetic algorithm proposed by Su and Chang [26] to find a set of approximate weights. The second option is to use gradientbased methods to train the FHRCNN. Note that here we also have two different options to use gradient-based methods. The first option is to fix the values of Mji and mji and update only the connection weights wj and the bias  using the recursive LMS algorithm. However, it is advantageous to update all weights Mji ; mji ; , and wj simultaneously because the modeling performance of the FHRCNN will be significantly improved in this manner. Therefore, the second and more efficient option is to use the backpropagation algorithm for adaptation. It has to be emphasized that the parameter sj can also be updated in the same manner as the other parameters during the training procedure by updating sj in the reverse direction of the partial derivative of the squared error with respect to sj . D. Step 4: Interpret the Trained FHRCNN After sufficient training, the square error can be minimized to arbitrary accuracy. Now, the values of weights are interpreted into meaningful fuzzy if–then rules. Since the function mj (x) measures the degree to which an input pattern x is close to the n-dimensional hyperrectangle defined by [mj 1 ; Mj 1 ] 2 1 1 1 2 [mjn ; Mjn ], we may define a fuzzy set HRj characterized by mj (x) in the input space, with mj (x) representing the grade of membership of x 2 Rn in the fuzzy set HRj . Then the presence of an input pattern to the j th hidden node is equivalent to firing the following fuzzy rule: R

(j )

:

If (x is HRj ) then Out(x) is wj :

(18)

The total number of extracted fuzzy rules is n1 + 1 1 1 + nk since this number is the number of hidden nodes in the trained FHRCNN. The final crisp output is computed as follows: J Out(x) =

j =1

wj mj (x) + :

(19)

V. SIMULATIONS The class of FHRCNN’s were investigated for transient stability prediction using the IEEE 39-bus ten-generator system shown in Fig. 8, as reported in [27]. In our simulation model, the generator is modeled by seventh-order differential equations and the loads are modeled as constant impedances. A detailed description of the above model refers to [28]. Three-phase short-circuit-to-ground faults

Fig. 8. One-line diagram of IEEE 39-bus system equipped by ten PMU’s.

with four-cycle clearing time are simulated to occur on various transmission lines. The postfault system configuration is the same as the prefault system, except that the faulted line is removed. Each example of a fault contains the simulated postfault phasor measurements along with whether the particular fault results in instability. Large numbers of examples are aggregated together into the training and test sets, from which a FHRCNN is constructed and tested. The following subsections describe the precise methodology for generating the various training and test sets. A. The Predictor Stability prediction is based on an eight-cycle window of phasor measurements that begins at fault clearing time Tc . Three consecutive measurements, four cycles apart, are taken from each of the ten generator angles: the first measurement at Tc , another at Tc + 4=60, and the last at Tc + 8=60. The generator angles, measured in radians and in the center of angle coordinates, are first written to a data file in a format with three digits after the decimal. Two velocities and one acceleration are computed from generator angles, for a total of six predictors per generator. Denoting the three angle measurements from the ith generator as i (0); i (1); i (2), we compute

3 [i (1) 0 i (0)] vi (1) = 10 3 [i (2) 0 i (1)] ai (0) = 20 3 [i (2) 0 2 3 i (1) + i (0)]: vi (0) = 10

(20) (21) (22)

Consequently, each example for the IEEE 39-bus test system contains sixty predictors. B. Instability Criterion The criterion for instability is whether the difference between any two generator angles exceeds 180 in the four seconds after clearing time. Otherwise, the fault is declared as stable. C. Training Set and Test Set For a given fault location, duration, and clearing action, the faulton and postfault trajectories are obtained from the PSS/E computer package developed by Power Technologies, Inc. (PTI). Several operaing points were generated to test the proposed fuzzy neural

Authorized licensed use limited to: National Taiwan University. Downloaded on March 9, 2009 at 04:08 from IEEE Xplore. Restrictions apply.

156

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 1, FEBRUARY 1999

TABLE I CLASSIFICATION RATES OF THE FHRCNN AND THE MLP FOR STABILITY PREDICTION

TABLE II PERFORMANCE COMPARISON BETWEEN FHRCNN’s AND MLP’s BASED ON THE NUMBER OF HIDDEN NODES

three popular fuzzy models described in Section II. An algorithm that makes full use of both the linguistic information and numerical information to train FHRCNN’s has been discussed. In this paper, the potential for using FHRCNN’s to solve transient stability prediction problems has been demonstrated. Computer simulations demonstrated the superiority of the proposed method over the MLP’s in the learning speed, the network complexity, and especially in the classification performance. We have shown the best classification accuracies were as high as 99%. In addition, the computational burden proved to be quite acceptable. The ability to predict with acceptable accuracy what is going to happen in the near future following a transient event opens new possibilities for power system protection and control. REFERENCES

network on a stressed system and to study the method’s robustness to variations in the operating point. Our base case was obtained by increasing the real powers of the individual load by 25%. The extra power generation was spread uniformly among the generators. We chose an increase of 25% because it lowered critical clearing times, while maintaining an acceptable load-flow solution. Fifty operating points were generated from base case by considering random changes of key parameters, like load, shunt compensation, active and reactive generation scheduling, and topology. The distribution of the random numbers was uniform rather than Gaussian, and a different string of random numbers was used for each operating point. A bus fault refers to a fault on the end of a transmission line that is cleared by removing the line. A midline fault refers to a fault in the middle of a transmission line. For the training set, we simulate 650 bus faults and 350 line faults per operating point, in which 330 faults are unstable and 670 faults are stable. For the test set, we simulate 350 bus faults and 150 line faults, in which 165 faults are unstable and 335 faults are stable, separate from the training set per operating point. D. Results The simulation program was developed on a SUN SPARC II in C++. For a comparison, we conducted simulation results on MLP’s trained by the backpropagation algorithm with the same number of hidden neurons, the training set, and the test set. A two-layer FHRCNN and a two-layer MLP (with one hidden layer) were first trained on the training set using 61 neurons. It took about 2 h to train the FHRCNN and about 3 h to train the MLP. Both networks were tested by the test set. It took just a few hundredths of 1 s to make the stability prediction in the 4-s after-fault clearing for both networks. The classification rates of both networks are given in Table I. The numbers indicate the percentage of predictors correctly classified. In addition, we also compared the performance based on the number of hidden nodes. The results are tabulated in Table II. From simulation results, we have the following observations. • FHRCNN’s have pretty high classification rates. • FHRCNN’s have better performance than traditional MLP’s. • FHRCNN’s have the potential to be an online tool for real-time transient stability prediction in power systems. VI. CONCLUSION We have suggested a neural-network-based fuzzy model for modeling a system. The class of FHRCNN’s studied in the paper integrate the paradigm of neural networks with the fuzzy-rule-based approach, rendering them more useful than either. Besides, our model can more efficiently capture the correlations between input variables than the

[1] G. Cybenko, “Approximation by superpositions of a sigmoid function,” Math. Contr., Signals, Syst., vol. 2, pp. 303–314, 1989. [2] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989. [3] K. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural Networks, vol. 2, pp. 183–192, 1989. [4] L. A. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning,” Inform. Sci., vol. 8, pp. 199–257, 1975. [5] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its application to modeling and control,” IEEE Trans. Syst., Man, Cybern., vol. SMC-15, pp. 116–132, Jan. 1985. [6] W. Pedrycz, Fuzzy Control and Fuzzy Systems. New York: Wiley, 1993. , “An identification algorithm in fuzzy relational systems,” Fuzzy [7] Sets Syst., vol. 13, pp. 153–167, 1984. [8] R. Babuska and H. B. Verbruggen, “A new identification method for linguistic fuzzy models,” in Proc. IEEE FUZZ, Japan, 1995, pp. 905–912. [9] M.-C. Su, “A neural network approach to knowledge acquisition,” Ph.D. dissertation, Univ. Maryland, College Park, Aug. 1993. [10] , “Use of neural networks as medical diagnosis expert systems,” Comput. Biol. Med., vol. 24, no. 6, pp. 419–429, 1994. [11] M.-C. Su and C.-J. Kao, “Time series prediction based on a novel neurofuzzy system,” in Proc. 4th Golden West Int. Conf. Intell. Syst., San Francisco, CA, 1995, pp. 229–233. [12] M.-C. Su, “Identification of singleton fuzzy models via fuzzy hyperrectangular composite NN,” in Fuzzy Model Identification-Selected Approach, H. Hellendoorn and D. Driankov, Eds. Berlin, Germany: Springer-Verlag, 1997, pp. 193–212. [13] S. Rovnyak, S. Kretsinger, J. Thorp, and D. Brow, “Decision trees for real-time transient stability prediction,” IEEE Trans. Power Syst., vol. 9, no. 3, pp. 1417–1424, 1994. [14] V. Centeno et al., “Adaptive out-of-step relaying using phasor measurement techniques,” IEEE Comput. Applicat. Power, vol. 6, pp. 12–17, Jan. 1993. [15] S. Rovnyak, C.-W. Liu, J. Lu, W. Ma, and J. Thorp, “Predicting future behavior of transient events rapidly enough to evaluate remedial control options in real-time,” IEEE Trans. Power Syst., vol. 10, pp. 1195–1203, Nov. 1995. [16] C.-W. Liu and J. S. Thorp, “Application of synchronized phasor measurements to real-time transient stability prediction,” Proc. Inst. Elect. Eng. C, vol. 142, no. 4, pp. 355–360, 1995. [17] J. S. Thorp, A. G. Phadke, S. H. Horowitz, and M. M. Begovic, “Some applications of phasor measurements to adaptive protection,” IEEE Trans. Power Syst., vol. 3, pp. 791–798, May 1988. [18] L. X. Wang, Adaptive Fuzzy Systems and Control: Design and Stability Analysis. Englewood Cliffs, NJ: Prentice-Hall, 1994. [19] L. X. Wang and J. M. Mendel, “Generating fuzzy rules from numerical data with applications,” USC SIPI Rep. 169, 1991; also in IEEE Trans. Syst., Man, Cybern., vol. 22, pp. 1414–1427, Nov. 1992. [20] M. Sugeno and T. Yasukawa, “A fuzzy-logic-based approach to qualitative modeling,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 7–31, Jan. 1993. [21] J.-S. Roger Jang and C.-T. Sun, “Functional equivalence between radial basis function networks and fuzzy inference systems,” IEEE Trans. Neural Networks, vol. 4, pp. 156–159, Jan. 1993. [22] L. X. Wang and J. M. Mendel, “Back-propagation fuzzy systems as nonlinear dynamic system identifiers,” in Proc. IEEE Int. Conf. Fuzzy Syst., San Diego, CA, 1992, pp. 1409–1418.

Authorized licensed use limited to: National Taiwan University. Downloaded on March 9, 2009 at 04:08 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 1, FEBRUARY 1999

157

[23] H. Takagi and I. Hayashi, “NN-driven fuzzy reasoning,” Int. J. Approximate Reasoning, vol. 5, no. 3, pp. 192–212, May 1991. [24] P. Werbos, “New tools for predictions and analysis in the behavioral science,” Ph.D. dissertation, Harvard Univ., Cambridge, MA, 1974. [25] D. E. Rumerhart and J. L. McCleland, Eds., Parallel Distributed Processing. Cambridge, MA: MIT Press, 1986. [26] M.-C. Su and H.-T. Chang, “A real-valued GA-based approach to extracting fuzzy rules for system identification,” in Proc. Int. Joint Conf. Fuzzy Theory Applicat., 1995, pp. 41–46. [27] M. A. Pai, Energy Function Analysis for Power System Stability. Boston, MA: Kluwer, 1989. [28] P. M. Andersoon and A. A. Fouad, Power System Control and Stability. Ames, IA: Iowa State Univ. Press, 1977.

Fig. 1. Typical set of acquired data (SO2 ), showing some unrealistic values.

A Procedure for the Optimization of Air Quality Monitoring Networks B. And´o, G. Cammarata, A. Fichera, S. Graziani, and N. Pitrone

Abstract—This paper deals with the processing of air pollutant measures acquired in urban areas. A procedure for the optimization of a network for air pollutant monitoring is addressed. The proposed procedure allows for the optimization of the number of monitoring stations by using adequate mathematical models. Moreover, a continuous two-dimensional (2-D) map is used to search for the optimal reallocation of the monitoring stations. Reconfiguration of the available hardware leads to better performance. Index Terms— Air pollution, monitoring network, neural network, optimal location.

I. INTRODUCTION Many human activities produce pollutants that can greatly modify the composition of the atmosphere [1]. This phenomenon is more evident in urban areas, due to motor vehicle emissions and domestic heating systems, and in industrial areas, due to industrial activity. Thermal inversion can, moreover, insulate these areas [2]. The dispersion of pollutants therefore becomes increasingly difficult. A great number of pollutants are likely to cause problems for both human health and ecological systems. In order to limit the maximum concentration level, several countries have developed regulations, depending on each particular pollutant. In some urban areas, a dramatic reduction in vehicle traffic has been imposed by the public safety authorities, as soon as a pollution level exceeding the maximum safety limit is detected. The reader can refer to several documents available on this topic [3]–[5]. In order to perform air quality management and apply an assessment methodology, an effective monitoring system is required. It is necessary, in fact, to collect data on pollutant concentrations and, therefore, to install measuring systems. In the last few years, many techniques for the optimal allocation of monitoring stations have been developed [6]–[13]. The Monte Manuscript received February 9, 1997; revised December 1, 1997 and April 30, 1998. B. Ando, S. Graziani, and N. Pitrone are with the Dipartimento Elettrico, Elettronico e Sistemistico, Universit`a di Catania, 95125 Catania, Italy (email: [email protected]). G. Cammarata and A. Fichera are with the Istituto di Fisica Tecnica, Universit`a di Catania, 95125 Catania, Italy. Publisher Item Identifier S 1094-6977(99)00096-6.

Carlo variance reduction method, air quality simulation models, and optimization techniques have been investigated. Due to both the complexity of urban and industrial areas and the influence of meteorological quantities, the suggested techniques are not easy to apply and further research effort is required. In this paper a new methodology for the optimization of the number of stations in a monitoring network is proposed. A “first trial” monitoring network is assumed to be available. This assumption is not excessively restrictive: an initial set of data is required to start the optimization procedure. The first aim of this work is to determine the smallest number of monitoring stations needed to guarantee the desired precision in the estimation of pollutant levels. Mathematical models are identified to define the level of a pollutant at a monitoring point as a function of the pollutant levels recorded at other points. Hence, the set of monitoring stations that supply redundant data is determined. The smallest number of monitoring stations that must be left “alive” depends on the precision required for the determination of the pollutant levels. Those stations that can be turned off, on the basis of the previous analysis, are available for reallocation. The most “significant” places for the reallocation of the monitoring stations in the area under consideration are therefore searched for using the second step in the proposed procedure. II. DATA PREPROCESSING Preliminary processing of the data collected by the monitoring network is generally required to eliminate meaningless data due to errors occurring during the acquisition stage. Monitoring stations can fail in acquiring data due to problems occurring during both the transmission and the calibration phases. An example of unrealistic data is reported in Fig. 1. Two kinds of incorrect data are shown, indicated by the arrows “A” and “B,” respectively. The arrow A points to a pollutant level value that is unrealistically high, while the arrow B shows samples that have been set equal to zero. In the following subsections, the two methods used to detect errors are described. A. Threshold Method Both low and high thresholds for the levels of each pollutant can be established on the basis of the following considerations: 1) low concentration of each pollutant is present in the atmosphere, regardless of any human activity;

1094–6977/99$10.00  1999 IEEE

Authorized licensed use limited to: National Taiwan University. Downloaded on March 9, 2009 at 04:08 from IEEE Xplore. Restrictions apply.