Abstractâ This paper presents a modular yet integrated ap- proach to the problem of fast fault detection and classification. Although the specific application ...
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 6, NO. 5, SEPTEMBER 1998
623
A Modular Methodology for Fast Fault Detection and Classification in Power Systems Fahmida N. Chowdhury, Member, IEEE, and Jorge L. Aravena, Member, IEEE
Abstract— This paper presents a modular yet integrated approach to the problem of fast fault detection and classification. Although the specific application example studied here is a power system, the method would be applicable to arbitrary dynamic systems. The approach is quite flexible in the sense that it can be model-based or model-free. In the model-free case, we emphasize the use of concepts from signal processing and wavelet theory to create fast and sensitive fault indicators. If a model is available then conventionally generated residuals can serve as fault indicators. The indicators can then be analyzed by standard statistical hypothesis testing or by artificial neural networks to create intelligent decision rules. After a detection, the fault indicator is processed by a Kohonen network to classify the fault. The approach described here is expected to be of wide applicability. Results of computer experiments with simulated faulty transmission lines are included.
Fig. 1.
Overall scheme.
Fig. 2.
Residual generating with existing model.
Index Terms— Fault classification, fault detection, Kohonen networks, neural networks, power systems, real-time, wavelet transformation.
I. INTRODUCTION
•
A. Overview of the Modular Methodology
T
HE task of fast fault detection includes two major parts: 1) creation of a measure to serve as the indicator of normal–abnormal behavior and 2) design of a decision rule, based on that measure, to detect the fault. After detection, an additional phase of classification may be required. These three modules are described below. Our main focus is to develop a methodology that does not necessarily rely on the use of mathematical models. If models are available they can be used to advantage, but the technique can be implemented without an explicit model. The three-module scheme is shown in Fig. 1. 1) Generation of fault indicators (Module I): This can be done in two major ways. a) Model-based: This is the most widely used method. In this, a residual is generated, which is typically the difference between the actual system’s output and the output predicted by a model. This we shall call the model-based method, which is fully compatible with our modular methodology. There are two possibilities here. Manuscript received September 27, 1996. Recommended by Associate Editor, G. J. Rogers. The work of F. Chowdhury was supported in part by NSF Grant ECS-9 526 341. F. N. Chowdhury is with the Electrical and Computer Engineering, University of Southwestern Louisiana, Lafayette, LA 70504-3890 USA. J. L. Aravena is with the Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803 USA. Publisher Item Identifier S 1063-6536(98)06251-4.
•
An accurate mathematical or input–output (I/O) model of the nominal (fault-free) system is available and we can use the residual directly in the second module of our scheme. This is the best possible case; however, in many practical situations this does not hold. A mathematical model is not available, but an I/O model (such as the ARMA model if the system is linear, or a neural-network model if the system is nonlinear) can be built on-line. This can lead to the generation of a residual which can be used in Module II. However, I/O model building can be a very hard task if the system in question is nonlinear. Moreover, currently available I/O modeling techniques, including neural network methods, suffer from many restrictions. For example, the order of the system must be known or must be discovered by trial and error; one must assume that the system will operate fault-free for a long enough time so that a nominal I/O model can be developed, etc.
Figs. 2 and 3 show the residual-generating techniques. b) Model-Free: There are many situations when an accurate mathematical model is either unavailable
1063–6536/98$10.00 1998 IEEE
624
Fig. 3.
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 6, NO. 5, SEPTEMBER 1998
Residual generating with I/O model.
or is too complex, and the task of building an I/O model is not practical. For such cases we propose a model-free method of generating the fault indicator. The principle behind this approach is: if the fault is detectable it must produce changes in the monitored variables, which may be small but can be enhanced with signal processing techniques. Here we describe the creation of an orthogonal decomposition based on multirate filter banks. Incoming data (for example measured voltages and currents) are processed with the multirate filter bank to generate a set of sensitive fault indicators, without requiring a model of the system from which the data are coming. Generation of fault indicators without a mathematical model of the system is one of the main contributions of this paper, and is described in detail in later sections. 2) Fault detection (Module II): The fault indicators (regardless of how they have been generated) can be tested by conventional hypothesis testing methods, but since in general the indicators are vectors, the design of multiple hypothesis testing becomes very complicated. In this paper we describe an alternate technique where the fault indicators are processed by a three-layer feedforward neural network. This neural net works as a hypothesis tester to answer the question: does a fault exist, or is this a normal situation? The details of this neural network depends upon whether its inputs are residuals or model-free indicators. The neural network for use with residual-based methods is described briefly in the paper. The necessary changes for adapting this network to the wavelet-based fault-indicators are also developed and presented. 3) Fault classification (Module III): Only if a fault is detected, the indicators are entered into the classification network. The classification network is a self-organizing neural net that works as a pattern classifier and produces information on fault type and location. The actual operation of this module is dependent on whether or not a system model is available. a) In the absence of a model, this module can be incorporated with an expert system which is based on historical data of typical faults for the particular system. If such a knowledge-base is available, then the classification net would obtain exemplars for specific fault classes from Module I, and there would be no need to train Module III. In this case, Module I would provide exemplars by processing the actual
fault data with the filter bank and thus generating fault-indicators for known types of faults. b) If an accurate and convenient model is available, then we can introduce simulated faults into the model and generate specific types of residuals for specific types of faults, and use them as exemplars in the neural net of Module III. In this case, Module I is operating as a residual-based technique. Alternately, system or model responses during specific simulated faults can be processed with the filter-bank and used as exemplars for Module III, in which case Module I would remain model free. B. Current Research in the Field In the context of fault detection in general, the dynamical systems community is actively researching various approaches to fast detection and isolation. Most visible efforts are residualbased. Figs. 2 and 3 show two variations of the residualbased method of generating the fault indicator. For a good discussion of general-purpose failure-detection methods (see [1]). In [1] the authors summarize the available approaches and develop a general methology for the task. However, they do not mention any method which can be model-free, and can be implemented only using measured system outputs. Our survey of fault-detection methods in the specific context of power systems also shows that model/residual based techniques dominate the field. A recent report1 available on the World Wide Web [2] indicates that the problem of detecting high-impedance faults is far from solved. In general, fault-detection has remained an active area of research, and many different methods are proposed. For example, fixedgain filters [3], Kalman filters [4]–[7], fuzzy logic [8] and travelling waves [9] based approaches have been explored. The common technique in the residual-based methods is to generate estimated outputs (either from a “known” nominal model or by using system identification methods), and take the error (difference between estimated and actual values) as the indicator of normal/abnormal behavior. Whenever this indicator deviates from its theoretical value, one can assume that a fault has happened. The differences between the various fault-detection methods mainly lie in two major areas: how to generate the residuals, and how to test them. However, all these methods rely on the availability (or estimation) of accurate mathematical models, and are therefore subject to all the limitations of modeling uncertainties, unknown nonlinearities in the actual system etc. Neural nets have been well studied for power system applications. One can easily compile dozens of references on the subject [10]–[16]. However, the use of wavelets is very recent, and interest in this tool is now growing. An indication of such interest is the work by Robertson et al. [17]. A novel feature of our paper is the integration of the two tools (wavelets and neural networks) for the development of fast detectors which are not dependent on the availability of accurate models. Despite the large number of neural nets applications to power 1 This report is from the IEEE Power Systems Relaying Committee Working Group.
CHOWDHURY AND ARAVENA: MODULAR METHODOLOGY FOR FAST FAULT DETECTION
systems, we believe that our use of them as hypothesis testers is unique. Their usefulness in this mode has been verified by one of the authors in conjunction with a Kalman filter based estimation process. The application was power system state estimation and fault detection [7]. Also, available literature suggests that the main focus in power systems neural net applications has been on supervised learning. We believe that unsupervised learning, with its recognized capability for extracting relationships present in the data, is a better method for the classification task. The feasibility of unsupervised learning for power system fault classification was studied by Lubkeman et al. [15]. Our method differs from the Lubkeman approach in the following. • Model-free option: While our overall methodology is flexible enough to include the use of system models, we describe a model-free option. Techniques from signalprocessing provide a set of enhanced, sensitive, and definitive fault-indicator patterns, without requiring a system model, (while in [15], the data need to be processed with a Kalman filter, which presupposes that a system model is available). • On-line: Our method is intended for on-line application, while clearly in [15] the application would have to be offline because the construction of the input patterns requires the availability of optimal estimates of pre- and postfault magnitudes and angles of voltage and current phasors. etc. For the development of indicators, it is desirable to use techniques which are, as much as possible, based only on general cause/effect phenomena and not dependent on the availability of an explicit mathematical model. Then the technique will be applicable to a large class of situations. Although it is possible to use neural nets to generate fault indicators, at the present stage of development it appears that the feasibility of most of the approaches is based on simulation studies. Lacking a theoretical guarantee of workability, these methods usually cannot be generalized. Hence our decision to try a fresh approach based on a theoretically well-establised tool from the field of applied mathematics and signal processing. However, the demonstrated success of neural nets as decisionmakers and pattern classifiers either equals or surpasses that of the commonly used statistical methods. Our wavelet-based approach is meant to be used together with neural networks for decision-making and classification. We have performed simulation tests on a transmission line model to verify the workability of our approach. The working assumption is that a detectable fault must affect the available instrumentation/data acquisition system. The effect may be very small in conventional instruments but can be enhanced with appropiate signal processing tools. The approach does not presume any a priori knowledge about the system being analyzed. II. FILTER BANK FOR THE DEVELOPMENT OF MODEL-FREE FAULT INDICATORS Any detectable fault must introduce transients in the observed data. These “irregularities” carry important information about the fault. It is generally accepted that a description
625
in a time-frequency space is well suited to the study of nonstationary phenomena. We quote from [18]: Until recently, the Fourier transform was the main mathematical tool for analyzing singularities. The Fourier transform is global and provides a description of the overall regularity of signals, but it is not well adapted for finding the location and the spatial [temporal] distribution of singularities. This was a major motivation for studying the wavelet transform in mathematics and applied domains. It is reasonable to apply a similar argument to fault-induced transients. If we assume that the signals are monitored continuously, the continuous wavelet transform yields a very useful timescale (frequency) representation of a signal. Its determination, however, can be very time consuming. A more realistic approach is to assume a computer-based data acquisition system producing discrete time signals which can be decomposed in wavelet packets [19]. In this case, one uses multirate filter banks to create representations of the discrete time signal over different regions of the time/frequency domain. By a suitable selection of the of the filter banks, one can create very general time-frequency representations. In the conventional applications of multirate filter banks, one has an analysis bank which uses downsampling to reduce redundancy and to increase efficiency (e.g., by reducing the number of samples that must be sent over a communication channel) and a synthesis bank which in the case of perfect reconstruction, uses the output of the analysis bank to recreate the original signal. In the application described here, we use the outputs of the synthesis bank to decompose the signal into orthogonal components which have essentially no overlap in the frequency domain (zero overlap is theoretically impossible for real signals and filters). This last characteristic can be used to increase selectivity and sensitivity in the detection process. Selectivity is increased because one can differentiate transients whose frequency characteristics are very similar. Moreover, if one has information about the system, or a model, one can define a specialized bank to define frequency bands that one should monitor; reducing (ideally, eliminating) frequency overlap would concentrate the energy of the fault induced transient in a small number of bands thereby increasing sensitivity of the detector. Given our purpose of using minimal information about the system, we chose a filter bank which approximately partitions the discrete frequency range in bands of equal width. In order to create efficient indicators we require that the signals created by the filter bank should be mutually orthogonal in the time domain. In the approach presented here, we use outputs from the filter bank to create instantaneous information vectors, which are not necessarily orthogonal but show very quickly the effect of a fault. Here we expect a distinct advantage in using these signal processing tools. Since the monitored variables change in a continuous manner, the onset of a fault is very difficult to detect and introduces a delay in the detection process. If one can enhance the detection of the fault induced transient then one can start taking corrective actions faster
626
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 6, NO. 5, SEPTEMBER 1998
Fig. 4. Filter bank. Analysis filters create the compressed components. Synthesis filters create the orthogonal components.
than conventional detectors would allow; it is not difficult to conceive a scenario where early (or ealier) warning is advantageous. The filter bank described below is derived from Daubechies’ orthogonal wavelets with compact support. However, it has been enhanced by the introduction of a bilateral decomposition to create a uniform wavelet packet. Fig. 4 illustrates the principle of the bilateral decomposition. The filters are both FIR; the down arrow denotes a downsampling by two (taking every other sample) and the up arrow denotes upsampling by two (inserting a zero in between two values). For the example described in the case study, both have 20 taps and have been taken from Daubechies’ multiresolution work. , is used to denote the paraconjugate, .2 The symbol A formal proof of the properties of the decomposition is beyond the scope of this paper. However, we emphasize the fact that the actual implementation of the process is well established in the signal processing field and produces efficient numerical schemes.
H
2 If the filter, ; is causal then its paraconjugate becomes anticausal. For real time use, all filters must be causal. This is a common problem in DSP and it is solved by introducing a time delay. Thus, instead of perfect, instantaneous reconstruction, the filter bank is designed to be a perfect delay line
In Fig. 4, we show a scheme to generate four orthogonal components. Each additional level doubles the number of components generated and (approximately) reduces their bandwidth by a factor of two. This uniform distribution of the bands appears as the best choice in the absence of any information about the process. If additional information, or a model is available, one can design nonuniform bands retaining the orthogonality of the signals. As opposed to a conventional Fourier analysis, the orthogonal components retain the time information of the original signal. Hence one can use them to localize the fault. On the other hand, the compressed components have undergone levels of decimation. This fact is important for the generation of compact signatures of the various faults. A. The Fault Indicator: Instantaneous Information Vector As mentioned before, our goal is to generate a fault indicator that does not depend on the availability of a model of the dynamic system. Here, we use the instantaneous values of the resolved components to construct instantaneous vectors. Let us call them instantaneous information vectors (IIV’s). Basically, we can treat this similar to the residual vector generated in the model-dependent cases. During the course of an on-line operation, we can realistically assume that most of the time
CHOWDHURY AND ARAVENA: MODULAR METHODOLOGY FOR FAST FAULT DETECTION
627
the situation would be fault-free. Thus the IIV’s will have components with small values resulting from the numerical steps involved in the orthogonal decomposition process. These values we will call “noise”; they are not random noise in the sense that they do not come from the measured data. Rather, they are an artifact of the wavelet-decomposition. As such, our detection net must be taught to ignore them. However, when a set of faulty data is processed by the filter bank, the resulting IIV will contain one or more component with large nonzero values. Each different type of fault will produce a different transient “signature,” which will be reflected in that particular IIV. The idea is for the detection net to produce a “yes” output as soon as it encounters an IIV that represents a faulty situation. III. DECISION MAKING The second part of the task of fault detection involves decision-making, which is usually done by statistical tests. : the null hypothesis, which The two basic hypotheses are: : the alternate hypothesis, means a fault does not exist, and which means a fault exists. If a satisfactory and efficient hypothesis tester is available, it can be used as Module II. If a fault indicator has only one component (as in the residualbased method applied to a single-output system), a complicated Module II (such as the neural-net-based decision-maker) will be unnecessary; in such cases Module II should contain a simple hypothesis tester. In this paper we develop a general methodology which can be used for multioutput systems as well as single output ones. Even for single output systems, if Module I is a multirate filter-bank (that is, when a system model is not being used to generate residuals), its output will be a vector, thereby requiring a multiple hypothesis tester as Module II. Hence the development of the neural-net-based decision-maker. These are the reasons for using neural networks (instead of the conventional statistical hypothesis testing) to carry out the decision-making task: • For a multioutput system, multiple hypothesis testing becomes difficult to design. • The usual approach of using the joint probability density function (to decide whether a random vactor has deviated from its expected mean value) may mask deviations of the individual components, which may actually carry important information. • Biological neurons are natural hypothesis-testers, in fact that IS their basic function: thus a neural network should be ideally suitable for the task of multiple hypothesis testing. • Biological neurons carry out hypothesis testing without explicit knowledge of the statistical distributions and mathematical models involved: they store implicit models by way of synaptic weights, which depend on the past experience of the neurons. Artificial neural networks function on the same principle. Preliminary successful tests of a neural network as decision maker were reported by one of the present authors in [7]. In [7], a three-layer neural network was designed to work as
Fig. 5. The detection network.
a robust decision-maker in conjunction with a Kalman filter (which gave fast, on-line estimates of a three-phase power system quantities). However, in the present paper the idea is to replace the Kalman filter (which requires a system model and generates a residual vector as the fault indicator) by a multiresolution filter-bank (which does not require a system model, but can generate fault indicator vectors) as described in the previous section. The detection net described in [7] can be utilized for our purpose by replacing its inputs (modified residuals from a Kalman filter) by the IIV’s. For the sake of completeness, we first outline the basic operation of the Kalman-filter-based detection network. Then we will introduce a modification to use the IIV’s as the inputs to this network. Actually, since each of the three modules of our proposed methodology is self-contained, the detection module can be used successfully with conventional residuals, or with IIV’s as defined by us, or with any other suitable fault indicators. Fig. 5 is a conceptual representation of the scheme. This network can be used for both model-free and model-dependent versions, as shown in the next two sections. The model-dependent case is illustrated through an example of a three-phase power system. A. Model-Dependent Case Here, the three-phase voltage measurements are modeled as sinusoids with a known frequency (which is assumed to be the fundamental power frequency), but with unknown amplitude and unknown phase angle. Gaussian observation noise is assumed to be present, and the amplitude and phase angle are assumed to be constants with small random walk components. The three-phase voltage system is modeled as a multioutput system, with a state vector consisting of six components. Suppose each phase voltage has the form which can be rewritten for each time-step in the standard system-theoretic format (1)
628
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 6, NO. 5, SEPTEMBER 1998
where the observation matrix3 is time-varying. During and phase angle unfaulted operation, the amplitude remain constant, except for small fluctuations due to random disturbances, therefore, we have the state equation as (2) and are zero-mean, uncorrelated Gaussian noise Both processes. While generating the simulated data sequence, the amplitude is taken to be 1 p.u., and the angle is zero, and , respectively, for phases and assuming a balanced system under nominal operating conditions. A Kalman filter is designed for estimating the states of the above system. The filter uses the state model and the observation model, but does not have any knowledge of the actual values of the amplitude and phase angle. The Kalman filter equations are standard; they are not repeated here, except the equation for the residual, which is known to be a Gaussian random variable with mean zero and a known (actually, computed in the Kalman filter algorithm) covariance matrix. The residual is defined as (3) is the Kalman estimate of the state vector where at step , given measurements. The residual is normalized to yield a zero-mean, unit-variance random vector. The elements of this normalized residual vector are then , and used as inputs to the faultsquared detecting neural network. Under normal operating conditions (since the normalized residual has unit variance), the expected value of each input to the neural network is unity . This property is utilized (through a convenient constraint on the weights) in the design of the firing thresholds [see (4) and (5)]. Each consultant neuron (these are the hiddenlayer neurons) receives the full set of squared normalized residuals. Squaring is important because 1) we are concerned with the magnitudes of these random numbers and 2) if the weights associated with each input are equal, then the sum of these squared Gaussian random variables would have the Chisquared distribution, which would provide us with a baseline for comparing this technique with the conventional techniques. Each consultant neuron is a hybrid between a linear combiner and a hardlimiter. Their firing thresholds vary according to the “strictness” assigned to each one. The idea is to implement this network as a team of experts making decision based on prior experience with a given system. In our simulations, the training was done using the batchmode delta rule, in the MATLAB environment. The training of this network is subject to the following constraints:
(4)
3 For
details of the observation matrix see [7].
for each consultant neuron , where input vector, and
is the dimension of the
for all
(5)
With the above constraints, the expected value of the internal potential of each consultant neuron , under no-fault condition, can be found
(6)
This allows us to normalize the firing thresholds of the neurons so that they have a lower limit of unity. The constraints are implemented at each step of iteration during training. After the training converges, the weights are kept fixed. From then onwards, the decision-making network is ready to be used. It should be noted that we trained the network online, assuming that during the beginning of the process (training phase), the system would run without fault for a long enough time to permit the training to be completed. This detection network was tested extensively for various amounts of jumps in the voltage amplitudes. The detailed results can be found in [7]; it is noted here that even though the successful 15% of prefault detection rate was high, for small faults value) successful detection rate of about 50%. The response of the output neuron, when a fault (sudden drop in voltage amplitude) occurs, is shown in Fig. 6. This is an example of immediate detection, when the voltage amplitude of phase was suddenly dropped to 0.8. B. Model-Free Case The development of the model-free fault indicators (the IIV’s) requires us to modify the hypothesis testing scheme so that we replace the residuals by IIV’s. Since the IIV’s are not, strictly speaking, Gaussian random variables, we cannot use the squared values and arrive at a weighted Chi-squared distribution to assist us in the design of the firing thresholds for the neurons, as we did in the model-based case. However, like many other neural-net applications, we can use an experiencebased method to decide what threshold ranges we need to use for distinguishing between faulted and unfaulted IIV’s. It is known that during a transient in the original system, the IIV will contain some large components. These components are oscillatory in nature, therefore we choose to concentrate on their magnitude only. The following steps are needed to prepare the detection net for operation. • Use the absolute values of the IIV’s as inputs. • By experimentation with normal (no-fault) data records, choose the firing thresholds of the neurons so that they do not fire during normal operation. • Train the network with two classes of examples: IIV’s generated by no-fault cases, and IIV’s generated by faulty cases. Any type of fault should result in the on response of the final decision-making neuron.
CHOWDHURY AND ARAVENA: MODULAR METHODOLOGY FOR FAST FAULT DETECTION
629
Fig. 6. Output of the final decision-making neuron.
Fig. 7. Load voltage: no fault case.
IV. CLASSIFICATION
OF
FAULTS
The task of the classification network is to cluster detected faults into seperate classes. As such, it is a pattern recognition problem. Self-organizing neural networks have been successfully used as pattern classifiers [20], [21] in various contexts. Self-organization refers to the specific learning method without external examples. This is also called unsupervised learning. Given a set of input patterns, neighboring processing units
(neurons or cells) in a self-organizing neural network develop into detectors of specific categories of patterns. In that sense, each local cell-group acts like a decoder for the inputs. For using a self-organizing neural net, it is necessary to collect information about various types of power system faults. Since each type of fault would have its own unique signature on the wavelet-transformed coefficients, it should be possible to cluster the cases emerging from same types of faults,
630
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 6, NO. 5, SEPTEMBER 1998
Fig. 8. Load voltage: fault in segment three.
Fig. 9. Load voltage: fault in segment two.
and thus differentiate between different cases. Obviously, the classification net could be given a “no-fault” class, thereby eliminating the need for the detection net. However, for an on-line methodology, it is better to keep the detection net in the loop. The vast majority of data are expected to be of the no-fault type, and it would be inefficient (and computationally costly) to process all the data through the classification net, which is expected to be slower than the
detection net. Only in those instances when we already know there is a fault, it would be cost-effective to switch on the classification net. Two types of self-organizing nets are commonly used, the Kohonen map [21] and the ART (adaptive resonance theory) network [20]. Because of the computational burden of the ART net, we use the Kohonen net, with a few modifications, as the chosen classifier. In the following section we describe
CHOWDHURY AND ARAVENA: MODULAR METHODOLOGY FOR FAST FAULT DETECTION
Fig. 10.
631
Generator current: fault in segment two.
how we build the Kohonen network for the model-free case. However, the discussion is also valid (in principle) for cases when residuals (or any other suitable indicators that contain signatures of the fault type) are being used. A. The Kohonen Network: Some Choices and Modifications The neurons in a Kohonen network initially have a collection of random weights. The training vectors, one by one, are presented to the neurons. In the original form of the Kohonen , for the th input sample , net, the “winning” neuron is selected by the process of similarity matching, i.e., (7) is the number of neurons, and is the where and . Common practice is distance between the vectors to use the Euclidean distance (that is, the Euclidean norm of . Once the winning neuron is the difference vector found, it, and a selected neighborhood of it, is updated using the following rule: if otherwise (8) is the learning rate at time step , and is the where chosen topological neighborhood around the winning neuron , at time step . The learning process is stopped when the shift of position of any of the output neurons measured by the change in the weight vector associated with it falls below a preset value. It is important to note that the neighborhood is assumed to be time-varying in the above function description, even though in the original form of self-organizing feature maps it was considered to be fixed.
It has been found in many studies [22] that for best results, the topological neighborhood should be large in the beginning of the training process (the “ordering phase), and then shrink with time so that toward the end of the process (the “convergence phase”), should include only the closest neighbors of the winning neuron . The usual practice is to let the radius of the neighborhood shrink linearly with each update. Besides this time-shrinking nature, the neighborhood can also have lateral shrinkage. It has been demonstrated that [22] in biological neurons, there is lateral interaction: this means that when a neuron is firing, it excites other neurons in its closest neighborhood more than those farther away from it. To incorporate this feature in the algorithm, usually the neighborhood around the winning neuron is made to decay gradually [23], [24]. One of the typical choices is to let the amplitude of the topological neighborhood (centered on the winning neuron) decay according to a unimodal Gaussian rule. This means that the weight-update is the strongest for the winning neuron, and becomes weaker with increasing lateral distance. 1) Choice of the Distance Measure and Some Other Implementation Issues: The idea of similarity matching between vectors can be quite complex. While it is common practice to use the Euclidean distance (that is, norm of the difference of two vectors) as a measure of closeness, it can be argued that it is really a measure of magnitudinal similarity of vectors. Another measure of similarity, that of directional similarity, is the inner-product of two vectors. Take, for example, three and . According to vectors: magnitudinal similarity, the pair is closer to each other than the pair , but according to directional similarity is the winner. In the context of using waveletthe pair generated IIV’s (as defined in the previous section) as power
632
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 6, NO. 5, SEPTEMBER 1998
system fault indicators, the magnitude of the vector is not as important a feature as the information “which component of the vector shows the large nonzero value”; it is the allignment of the vectors that is more important. Thus, in our specific application we use the inner-product as a measure of closeness between vectors. We have introduced some other modifications to the Kohonen net for the specific application that we are interested in. These are described below. • Use of prior knowledge: The training of the Kohonen net results in the construction of a knowledge base derived from hisorical information regarding the system in question. In many practical situations, even though the system may be difficult to model, real information is available (voltage and/or current records of faulty cases) where the fault type and location were known, or were later found out. If we can construct fault classes by processing these data by the multiresolution filter bank and produce IIV’s corresponding to each known fault class, we will have some exemples of each fault type for the Kohonen net. Then the Kohonen net will be trained to cluster similar faults and produce exemplars for each type. Once trained, the net can be used as a ready-made classifier. • Experience-based learning rate: The common practice in Kohonen nets is to slowly reduce the learning rate for the updating of the exemplar vectors. This means that with each epoch of input-pattern presentation, the learning rate becomes smaller. In the so-called “conscience learning,” there is an added feature so that the more times a neuron wins, the less capable it becomes to capture future inputs. In our modification, is a decreasing function of the number of patterns captured by the particular neuron. In simple terms, this means that while the neurons are not penalized for winning, the more patterns they win, the slower they adjust their existing exemplar. If we assume that capturing a new pattern is like gaining a new experience, then our conjecture is that a new experience should make less of an impact on an already-experienced neuron than on an inexperienced one. This particular modification is a novel feature of this work. • Normalizing the exemplar vectors: Usually, when the inner-product is used as the measure of closeness between vectors, the input vectors are normalized by their own lengths. However, instead of normalizing the input pattern lengths, we choose to normalize the exemplars for the clusters. The reason behind this shift is that in our specific context, the magnitudes of the oscillations in the IIV’s actually contain useful information regarding the faulttype, so these magnitudes cannot be ignored. However, the clusters should not be able to win a new pattern due to their own large magnitude: we are concerned with “directional” similarity between vectors. Therefore, after each update, the length of the exemplar is readjusted to unity. We have not encountered this modification in the literature. 2) Training of the Kohonen Net: Training of the Kohonen
net would consist of processeing a large number of IIV’s which are generated by the filter bank in Module 1. The filter bank, of course, would need real (or simulated) data records.4 From these records, we gather a set of examples of different types of faults. It is known that each fault class can be associated with a known fault type. Once these IIV’s are classified, the Kohonen net in effect functions as a knowledge base. During the subsequent on-line operation of the net, any new fault that cannot be identified by the existing classes, could be stored and later added to the knowledge base. V. SUMMARY
OF
SIMULATION STUDIES
The goal of the computer experiments was to investigate and illustrate the model-free method of generation of fault indicators. (We assume that the model-based methods are wellestablished, although they are less widely applicable than the proposed model-free technique.) Simulation experiments were performed on a transmission line modeled using SIMULINK. The following experiment is a representative case among many runs of the experiments. Note that the system model is used solely to generate the simulated system response: the operation of Module I (generation of fault indicators) requires only the response, not the model. The system in our study is a three segment model of a transmission line. Each segment is represented by a line impedance (resistence plus inductance) and a line-to-ground admittance (conductance and capacitance). In the experiment described here, all three segments have identical parameters. The faults considered are partial short circuit to ground, emulated by increasing the line-to-ground admittance from a prefault value of 0.001 to a post fault value of 0.1. For comparison, the load has a resistance of 0.5. Hence, the fault is relatively minor in magnitude, but it is a sustained fault. Four cases are considered: the normal unfaulted performance and three faulty lines. The same type of fault was applied one by one to each of the three segments. In all cases, the variables monitored were the current at the generation point and the voltage at the load. The data was processed using a bilateral filter bank based on Daubechies’ wavelet of order ten. The results included here have separated the signal into eight components. This eight component vector is identified as the IIV and is displayed in the next figures. For brevity, we present only decompositions for the following cases: 1) no fault; 2) load voltage for fault in segment three; 3) load voltage for fault in segment two; and 4) generator current for fault in segment two. Also, for more clarity, only the behavior is shown. in the neighborhood of the fault time With this case study, we want to highlight the following points: 1) The decomposition did not use any knowledge about the system and was based entirely on processing the data with a fixed digital filter bank based on Daubechie’s compact support wavelets. Hence, it supports the validity of the model-free aproach to fault detection. It is 4 This does not mean that one would need to introduce faults deliberately into the real system. Recorded fault data from real faults that periodically occur in the system would be sufficient.
CHOWDHURY AND ARAVENA: MODULAR METHODOLOGY FOR FAST FAULT DETECTION
2)
3)
4)
5)
6)
expected that an optimized design will yield significantly better performance. Fig. 7 shows that, for the unfaulted case, only one component of the IIV is nonzero. This is the lowest frequency component. It confirms our observation about smooth signals yielding components in the lowfrequency range. From the point of view of detection, it is a very convenient situation since only one band reacts to the unfaulted data. We do not expect that situation to occur with uniform frequency partitioning in a more general case. However, we expect that one can design a sufficiently fine filter bank in such a way that the effect of the fault will appear in only a few selected bands which would not be significantly affected by the normal signals. This separation would favor the detection of faults. Fig. 8 shows strong variations in the IIV components. Each component has a distinctive bahavior according to its localization in the frequency domain. One can make use of this distinctive bahavior to classify faults. Using a common dimensionless scale for the high frequency components, Figs. 9 and 10 illustrate the fact that some variables are more sensitive to a given fault. In this case it is apparent that the variations in the load voltage IIV is more significant than the variation in the generator current. They also show that some bands may be insensitive to some types of faults. All figures show that the IIV created by the filter bank show a clearly different performance according to the location of the fault. Hence they should be able to create indicators sensitive to the type of the fault, and provide indication about their spatial location. The fault simulated in this example is minor but sustained, which means it is an important detection problem. Large faults are easier to detect with high accuracy with the help of residual-based methods. The probability of correct detection falls drastically as the fault size gets smaller. In [7] it was reported that the neuralnetwork based detector which operates on the magnitude of residuals (generated by a Kalman filter) becomes inaccurate when the change in the amplitude of the voltage waveform is less than about 15%. By comparison, the wavelet-based detector is much more sensitive. Besides, this method does not require the availability of an explicit mathematical model of the system. VI. OPEN RESEARCH QUESTIONS
One of the hypotheses still to be researched is that with a better choice of decomposition one can increase the detection sentivity and make the system robust to measurement noise. The other issue is on-line implementation. More research needs to be done in the following categories: 1) The use of continuous wavelet transforms and transforms with possible hardware implementation for maximal speed. 2) The selectivity of the wavelet-based sensors. This issue is application dependent with a strong experimental component. Our ongoing research is to investigate var-
633
ious types of faults in the system under study and determine the components that would allow us to distinguish among the faults. 3) The effect of measurement noise. In the preliminary experiments, the orthogonal components that carry information about the fault are very small and could easily be masked by larger measurement noise. We plan to investigate the use of signal enhancing techniques to improve the quality of the detectors. In particular, the use of wavelet-based enhancing techniques is being researched [25]. 4) Developing the entire methodology as an on-line system. Currently Module I operates off-line, so that the filters are noncausal. A major goal of future work would be to implement a “moving window” type filter bank, where old data points are discarded as new data points become available. Modules II and III are ready for on-line use. VII. CONCLUSIONS A modular scheme for fault detection and classification is developed in this paper. Each module can be designed in two different ways, model-based and model-free, depending on the intended application and available information. We present the model-free method for the generation of fault indicators in detail. The method utilizes multirate filter banks based on wavelet decomposition of actual data. Extensive simulation studies illustrate the use of the proposed method. The testing of these indicators is done by a decision-making neural network, which can also be adapted to both the model-based and modelfree situations. A modified Kohonen-type neural network is proposed for the classification task. It is anticipated that the integrated and modular approach presented in this paper can eventually be developed into a widely appliacable tool in detection and classification of faults in dynamic systems. ACKNOWLEDGMENT The authors are grateful for many helpful questions and suggestions from the anonymous reviewers. REFERENCES [1] M. M. Polycarpou and A. T. Vemuri, “Learning methodology for failure detection and accomodation,” IEEE Contr. Syst. Mag., June 1995, pp. 16–24. [2] J. Tengdin, R. Westfall, and K. Stephan, “High Impedance Fault Detection Technology,” Rep. of PSRC Working Group D15, Mar. 1997. Available http://www.rt66.com/ w5sr/psrc.html [3] E. O. Schweitzer and Daquing Hou, “Filtering requirements for distance relays,” in Proc. Amer. Power Conf., vol. 55-I, 1993, pp. 296–301. [4] A. Girgis and E. B. Makram, “Application of adaptive Kalman filtering in fault classification, distance protection, and fault location using microprocessors,” IEEE Trans. Power Syst., vol. 3, pp. 301–309, Feb. 1988. [5] F. N. Chowdhury, J. P. Christensen, and J. L. Aravena, “Power system fault detection and state estimation using Kalman filter with hypothesis testing,” IEEE Trans. Power Delivery, vol. 6, pp. 1025–1029, July 1991. [6] J. L. Pinto de Sa and L. Pedro, “Modal Kalman filter-based impedance relaying,” IEEE Trans. Power Delivery, vol. 6, pp. 78–84, Jan. 1991. [7] F. Chowdhury, “On-line fault detection in multioutput systems using Kalman filter and neural network,” in Proc. Amer. Contr. Conf., vol. 2, June 1994, pp. 1729–1731. [8] A. Ferrero, S. Sangiovanni, and E. Zappitelli, “A fuzzy-set approach to fault-type identification in digital relaying,” in Proc. IEEE Conf. Transmission and Distribution, Apr. 1994, pp. 269–275.
634
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 6, NO. 5, SEPTEMBER 1998
[9] P. A. Crossley and P. G. McLaren, “Distance protection based on travelling waves,” IEEE Trans. PAS, vol. PAS-102, no. 9, pp. 2971–2983, Sept. 1983. [10] A. Sharaf, “ANN-based pattern classification of synchronous generator stability and loss of excitation,” IEEE Trans. Energy Conv., vol. 9, no. 4, pp. 753–759, Dec. 1994. [11] A. Mazroua, “Neural-network system using the multilayer perceptron technique for the recognition of PD pulse shapes due to cavities and electrical trees,” IEEE Trans. Power Delivery, vol. 10, pp. 92–96, Jan. 1995. [12] H. Yang, W. Chang, and C. Huang, “Online fault diagnosis of power substation using connectionist expert system,” IEEE Trans. Power Syst., vol. 10, pp. 323–331, Feb. 1995. [13] S. Eborn, D. L. Lubkeman, and M. White, “A neural-network approach to the detection of incipient faults on power distribution feeders,” IEEE Trans. Power Delivery, vol. 5, pp. 905–912, Apr. 1990. [14] K. Nishimura and M. Arai, “Power system state evaluation by structured neural network,” in Proc. IJCNN’90, vol. 1, pp. 271–277, June 1990. [15] D. Lubkeman, C. Fallon, and A. Girgis, “Unsupervised learning strategies for detection and classification of transient phenomena on electric power distribution systems,” in Proc. 1st Int. Forum Applicat. Neural Netowrks to Power Syst., Seattle, WA, June 1991, pp. 107–111. [16] K. S. Swarup and H. S. Chandrasekharaiah, “Fault detection and diagnosis of power system using artificial neural networks,” in Proc. 1st Int. Forum on Applicat. Neural Netowrks to Power Syst., Seattle, WA, June 1991, pp. 102–106. [17] D. Robertson, O. I. Camps, and J. S. Mayer, “Wavelets and power system transients: Feature detection and classification,” in SPIE Int. Symp. Opt. Eng. Aerospace Sensing, vol. 2242, pp. 474–87, Apr. 1994. [18] S. Mallat and W. L. Hwang, “Singularity detection and processing with wavelets,” IEEE Trans. Inform. Theory, vol. 38, pp. 617–643, Mar. 1992. [19] A. K. Soman and P. P. Vaidyanathan, “On orthonormal wavelets and paraunitary filter banks,” IEEE Trans. Signal Processing, vol. 41, pp. 1170–1183, Mar. 1993. [20] G. Carpenter and S. Grossberg, “A massively parallel architecture for a self-organizing neural pattern recognition machine,” in Neural Networks, Theoretical Foundation and Analysis, C. Lau, Ed. Piscataway, NJ: IEEE Press, 1992. [21] T. Kohonen, “The self-organizing map,” in Neural Networks, Theoretical Foundation and Analysis, C. Lau, Ed. Piscataway, NJ: IEEE Press, 1992. [22] S. Haykin, Neural Networks: A Comprehensive Foundation. New York: Macmillan, 1994. [23] H. Ritter, Neural Computation and Self-Organizing Maps: An Introduction. Reading, MA: Addison-Wesley, 1992. [24] Z. Lo, M. Fujita, and B. Bavarian, “Analysis of neighborhood interaction in Kohonen neural networks,” in 6th Int. Parallel Processing Symp. Proc., Los Alamitos, CA, 1991, pp. 247–249. [25] W. K. Awadzi, Feature Enhancement via the Wavelet Transform and Quadrature Mirror Filters, M.S. thesis, Louisiana State Univ., Baton Rouge, 1994.
Fahmida N. Chowdhury (S’86–M’87) received the combined B.Sc. and M.Sc. degree in electromechanical engineering from Moscow Power Engineering Institute, Moscow, Russia, and the Ph.D. degree in electrical engineering from Louisiana State University, Baton Rouge. She is currently an Assistant Professor of Electrical and Computer Engineering at the University of Southwestern Louisiana, Lafayette. Her research interests include neural networks, modeling, estimation, and detection problems in stochastic systems, applications of systems theory approaches to power systems, and probabilistic interpretations of robustness issues. Her educational interests focus on developing interdisciplinary courses. Dr. Chowdhury is a reviewer for the IEEE TRANSACTIONS ON SIGNAL PROCESSING and the IEEE TRANSACTIONS ON NEURAL NETWORKS, and a member of the Conference Editorial Board of the IEEE Control Systems Society. She also reviews proposals for the National Science Foundation.
Jorge L. Aravena (M’89) received the degree of Civil Electrical Engineer from the University of Chile at Santiago, and the Ph.D. degree in computer, information, and control engineering from the University of Michigan, Ann Arbor. Currently he is the Graduate Studies Coordinator for the Department of Electrical and Computer Engineering at Louisiana State University. He has published more than 30 refereed journal papers and 100 conference papers. His current areas of research include digital signal and image processing, m-D system theory, computer-based control, and parallel algorithms and computing structure. His research in nonplanar computing structures and fast parallel representation of filtering algorithms has been supported by the State of Louisiana. Dr. Aravena is frequent reviewer for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, Signal Processing, and Parallel and Distributed Processing. He also reviews proposals for the National Science Foundation and has been invited as national panel member to review Research Initiation Awards in Microelectronics Information Processing.