Intrusion detection through SCADA systems using ... - Semantic Scholar

3 downloads 270 Views 305KB Size Report
his PhD in Nuclear Engineering from University of Tennessee in 1989. ... Engineering at Virginia Tech where he conducts research in the broad area of ... connectivity to SCADA-obtained information using technology such as the internet and.
58

Int. J. Critical Infrastructures, Vol. 3, Nos. 1/2, 2007

Intrusion detection through SCADA systems using fuzzy logic-based state estimation methods Keith E. Holbert* Department of Electrical Engineering Arizona State University Tempe, Arizona 85287–5706, USA E-mail: [email protected] *Corresponding author

Amitabh Mishra Department of Electrical and Computer Engineering Virginia Tech Blacksburg, Virginia 24061, USA E-mail: [email protected]

Lamine Mili Bradley Department of Electrical and Computer Engineering Virginia Polytechnic Institute and State University Alexandria Research Institute Alexandria, VA 22314, USA E-mail: [email protected] Abstract: Supervisory Control And Data Acquisition (SCADA) systems represent a vulnerability in vital infrastructures. For example, an electric power system is subjected to intrusions via its SCADA systems; however, the instrumentation provides detectable variations in response to such interference. Presented herein is a strategy that augments state estimation methods using a Hybrid Fuzzy System for fault monitoring and diagnosis that aims to combine information from multiple domains in order to detect, isolate, identify, and mitigate threats to the system. Furthermore, to endow the state estimation solution methods with some degree of numerical robustness, algorithm-based error detection (ABED) is applied to the Gaussian elimination procedure. Simulation results revealed that ABED provides error detection at low costs and excellent error coverage for floating point arithmetic in the presence of permanent bit and word errors while being free of false alarms and insensitive to both data range and data size. Keywords: Algorithm Based Error Detection (ABED); Algorithm Based Fault Tolerance (ABFT); fuzzy logic; instrumentation fault characterisation; sensor anomaly detection. Reference to this paper should be made as follows: Holbert, K.E., Mishra, A. and Mili, L. (2007) ‘Intrusion detection through SCADA systems using fuzzy logic-based state estimation methods’, Int. J. Critical Infrastructures, Vol. 3, Nos. 1/2, pp.58–87.

Copyright © 2007 Inderscience Enterprises Ltd.

Intrusion detection through SCADA systems

59

Biographical notes: Keith Holbert is currently an Associate Professor in the Electrical Engineering Department of Arizona State University. He earned his PhD in Nuclear Engineering from University of Tennessee in 1989. His research interests focus on instrumentation, signal processing, and fuzzy logic applications. Dr. Holbert is a registered Professional Engineer. He has published more than 60 journal and conference papers. Amitabh Mishra is currently an Associate Professor of Electrical and Computer Engineering at Virginia Tech where he conducts research in the broad area of computer-communications primarily focusing on the architecture and performance of wireless cellular, ad hoc, and sensor networks. His research is supported by NSF, NSA, and DARPA. He obtained his MS in Computer Science from the University of Illinois at Urbana-Champaign, and a PhD in Electrical and Computer Engineering from McGill University. Lamine Mili is a Professor of Electrical and Computer Engineering at the Advanced Research Institute of Virginia Tech. He received an Electrical Engineering Diploma from the Swiss Federal Institute of Technology, Lausanne, in 1976, and a PhD from the University of Liege, Belgium, in 1987. Dr. Mili is a senior member of the Power Engineering Society of IEEE, the recipient of a 1990 NSF Research Initiation Award and of a 1992 NSF Young Investigator Award. His research interests include robust statistics, statistics of extreme events, risk assessment and management, reliability analysis, power system analysis and control, and multifunction radar systems.

1

Introduction

It is well known that in the post-September 11, 2001 environment, industries have moved to higher levels of security, especially that of information security. Among these improvements is the upgrading of corporate intranet and internet access pathways with firewalls and other protective technologies. An area receiving less attention is that of real-time control systems. Monitoring of large infrastructures such as gas pipelines and electric power systems is largely accomplished using geographically dispersed instrumentation. This instrumentation not only gives a view of the system to operators but it also provides adversarial agents with a means to disrupt operations, or even destroy equipment. Consider an example from the rail transportation sector: in October 1995, with some technical skill, saboteurs were able to remove spikes and realign tracks, and then rewire the emergency warning system; subsequently, an Amtrak passenger train derailed, killing one person and injuring approximately 100 other people (Waggoner, 2003). Supervisory Control And Data Acquisition (SCADA) systems are commonly employed in the natural gas, electric power, water supply, transportation, and financial industries. A SCADA system generally consists of four major components: field instrumentation, remote stations, a communications network, and a central monitoring and control station. In an overall SCADA system, the quantities measured pass through transducers and converters, and are ultimately telemetered to the control system over communication links. Many systems deploy Remote Terminal Units (RTUs), which are special purpose computers typically containing analogue-to-digital converters, digital-to-analogue converters, digital (status) inputs and (control) outputs, and multiple

60

K.E. Holbert, A. Mishra and L. Mili

communications ports. Recently, RTUs are being replaced by Intelligent Electronic Devices (IEDs) in the substations of electric power systems. Unlike an RTU, an IED is a multi-purpose computer that integrates sensing, protection, and control functions (McDonald, 2003). Prior to 11 September 2001, utilities were seeking to increase the corporate-wide connectivity to SCADA-obtained information using technology such as the internet and intranet, for examples, see Zecevic (1998), Ebata et al. (2000), Medida et al. (1998) and Denzer (1997). In addition, a decade ago, emphasis was being placed on ‘open’ architectures (Martire and Nuttall, 1993; Klein and Menendez, 1993), for example, the fieldbus. Fieldbus is a special purpose Local Area Network (LAN) used to connect various devices including sensors, actuators, transmitters, programmable controllers, and processors (Thomesse, 1998). There has been limited work in securing internet access to a fieldbus via a gateway approach (Sauter and Schwaiger, 2002) and almost no interest on security of the fieldbus itself (Schwaiger and Treytl, 2003). More recently, activities related to securing SCADA systems have been on the rise, for instance, see Oman et al. (2000), Dagle (2001), Pollet (2002), Dagle et al. (2002), Abshier and Weiss (2004) and Byres and Lowe (2005). In particular, Williams (2003) elaborates on some of the security vulnerabilities of a SCADA system in the context of a petroleum storage facility that is protected against nuclear, bacteriological, and chemical attack. Many institutions in the USA are reluctant to report instances of intrusions into SCADA-type systems. Public acknowledgement could detrimentally affect investor confidence in the utility as well as bring unwanted government oversight or investigation (e.g., from public utility regulators). However, when an intruder is prosecuted, the incident receives significant visibility in the media. For example, in Queensland, Australia, a disgruntled former consultant used the internet, a wireless radio and stolen control software to release up to one million litres of raw sewage into the river and coastal waters (Lemos, 2002). The signal/data transmission from the sensors to the monitoring and control system must be accomplished in real-time. SCADA systems utilise a variety of communications media including telephone, fibre optic, radio, and even satellites. Retrofits of existing systems with authentication of individual datum would not only be expensive but also could deteriorate signal bandwidth to the point of non-usability. A potentially more effective means of confirming signal integrity is to assess the data after receipt by the SCADA system. Because SCADA data are often transmitted in the open using simple text constructs, one research group chose to investigate n-gram anomaly detectors to identify bad values caused by attacks and faults (Bigham et al., 2003). Various Fault Detection and Isolation (FDI) schemes (Betta and Pietrosanto, 2000) have been developed over the years for monitoring and diagnosis of anomalies in critical systems, for example, within the aerospace industry. These FDI algorithms were originally targeted toward detecting faults due to system failure and environmental conditions. Those same techniques are now being considered for detecting intentional attacks upon the systems they monitor. Some SCADA systems employ state estimation techniques to resolve discrepancies between sensor readings. State estimation relies on known physical relationships between the state variables of a system. These relationships can originate from either analytical knowledge and/or empirical observations.

Intrusion detection through SCADA systems

61

Most researchers and practicians recognise that a particular FDI approach is effective under a limited set of operating states and fault conditions. For this reason, some systems employ multiple FDI techniques in a modular architecture (Holbert and Upadhyaya, 1990). This approach can be likened to that of a medical check-up, in that there is not a single all-encompassing test that the physician performs to assess an individual’s health. With that thought in mind, in this paper we present research results from investigating and developing FDI methods for application in SCADA systems such as those employing state estimation. The first method utilises fuzzy logic to detect, characterise and mitigate instrumentation anomalies in power system state estimation. The second technique employs algorithm based error detection for the Gaussian elimination method. The error expressions for the round-off errors in the algebraic processes within the Gaussian elimination procedure are derived. Some simulation results carried out on various power systems are presented and analysed. The paper is organised as follows. Section 2 extracts various statistical signatures from measurement data and then using fuzzy logic combines the signatures with system knowledge to detect, isolate and characterise system anomalies. Section 3 describes an intrusion detection method based on power system state estimation. Section 4 details the application of ABFT techniques to the Gaussian elimination procedure for state estimation algorithms. We then conclude the paper with overall thoughts and future prospects.

2

Fuzzy logic fault detection and anomaly characterisation

Correct estimation of the system operating states in the presence of uncertain measurement data is a crucial challenge for real-time system monitoring (Holbert and Lin, 2004). Contamination of sensor data is customarily viewed as being caused by instrument inaccuracies and failures. Lately, intrusion into computer systems has become a threat to system monitoring and sensory applications due to this potential source of corrupted data. Hence, measurement uncertainty can be introduced by a variety of sources. Human operators have an advantage over control and protection systems in terms of their experience and ability to assimilate a wide spectrum of information and new data. In contrast, computers have the advantage of being able to process such information much faster than their human counterparts can. Fuzzy Logic (FL) is an artificial intelligence tool that can take advantage of the operators’ experience and the fast data processing capability of computers. Consequently, FL is selected in this research to develop a comprehensive fault detection system. Control systems are critically dependent on sensing equipment for system operation, protection and maintenance. Utilities already have a wide variety of sensory information that is available to assess the system vulnerability to disturbances. The instrumentation system is a double-edged sword with regard to its use and application. On the one hand, it provides critical support for infrastructure monitoring, measurement, control, and operation. The downside is that sensory systems may provide a window of accessibility to external agents intending to damage to the system (Padmanabh et al., 2003). The sources of vulnerability due to sensor failure include natural disasters, equipment failures, human errors, or deliberate sabotage.

62

K.E. Holbert, A. Mishra and L. Mili

Reliable knowledge of the system states requires calibrated instrumentation and robust state estimation methods. System monitoring and diagnosis, which use sensor readings, works together in detecting abnormal outputs and thereby suggest corrective actions (e.g., control and maintenance) that need to be taken. This requires that the readings obtained from the sensors be calibrated and reliable, and therefore, motivates the use of sensor validation. An interesting and possibly sobering reality is noted between the concerns for physical security now contemplated by electric utilities. In the past, greater concern was placed by utilities on non-intentional avenues of power system susceptibility; however, in today’s atmosphere, utilities are increasingly concerned with intentional intrusions, and individuals and entities that are dedicated to bringing the system down. Such an environment is the type that has been expected by war planners, but not that anticipated by utility engineers. Hence, a rethinking is appropriate in trying to address such problems. Under such adverse conditions, the reliability of the measurements becomes more important than their absolute precision (Lin and Holbert, 2003). Sensor validation, which is the process in which the sensor outputs are verified with respect to the system conditions, is divided into three steps as depicted in Figure 1: 1

Fault detection – determination of whether a fault has occurred

2

Fault isolation – identification of the particular sensor(s) experiencing an anomaly or failure

3

Fault characterisation – classification of the anomaly type(s) and/or the cause(s) of the failure.

A fourth step is mitigation, which can be accomplished using general state estimation techniques. These steps are analogous to the stages for investigating network security violations (i.e., detection, isolation, identification and mitigation). Figure 1

The stages for investigating and mitigating measurement system failures

Detection

Isolation

Characterisation

Estimation

Along the above lines, the presentation of the fuzzy logic-based failure detection system is divided into two separate discussions in later subsections, specifically, the fault detection, isolation and characterisation are performed in one investigation, and then state estimation is accomplished. A brief introduction to fuzzy logic is provided in Appendix A.

2.1 Principles of FDI An approach to detect and characterise sensor failures is developed here using fuzzy logic. In addition, sensor anomaly characterisation is accomplished via a fuzzy logic system incorporating diverse statistical signatures. Sensor anomalies are characterised as spikes and/or jumps. The results from fault detection and characterisation of data are detailed.

Intrusion detection through SCADA systems

63

The importance of detecting instrumentation failures has led to the development of several fault detection techniques. One method is hardware redundancy, which uses multiple sensors to measure the same output variable. This method suffers the disadvantages of extra cost, including maintenance, and additional space requirements to accommodate the equipment (Patton et al., 1989). Analytical redundancy (Simani et al., 2003) uses a comparison technique on sensors that measure different variables by taking advantage of the functional relationship between the state variables. This technique eliminates the need for extra hardware (Patton and Chen, 1993). Analytical redundancy can be achieved using state estimation (Patton et al., 1989), parameter estimation methods (Isermann, 1984), dedicated observer approaches (Frank, 1990), and innovation-based techniques (Mehra and Peschon, 1971). Statistical methods such as mean and variance estimations and artificial intelligence approaches such as Bayesian networks (Kirsch and Kroschel, 1994), data fusion (Park and Lee, 1993), neural networks (Himmelblau and Bhalodia, 1995), and expert systems have also been employed by various researchers.

2.1.1 Fuzzy FDI Fuzzy logic may also be used for instrument fault detection, as it possesses the advantage of transforming linguistic information to numerical values for processing, and then later back to linguistic information (Heger et al., 1996; Holbert et al., 1994). The user can refine the boundary conditions depending on variations in the operating conditions. For example, if a signal deviates significantly from its average value, then the sensor might be considered faulty. Fault detection and isolation are typically performed together in a monitoring system. In addition to fault detection, fault characterisation is an important component as it may lead to understanding of the type and cause of the anomaly; however, fault characterisation is not always incorporated, as it is more challenging to achieve. To specify the anomaly associated with the sensor, a characterisation stage is implemented in this work where, in addition to the anomaly type, the time at which the anomaly occurs is determined. In a previous paper (Ananthanarayanan and Holbert, 2004), the anomalies are characterised into momentary transient or jump faults. Transients (spikes) are defined as a sudden increase in the signal amplitude caused by various factors such as lightning strokes or switching activity in a system. Although short-lived, their detection is of utmost importance because they not only disrupt the system operation but they also shorten the lifetime of the equipment, and thereby increase cost. On the other hand, jumps are not short-lived but may increase or decrease the signal mean to a level and remain there for a sufficient time during which the jump presence could be detected. The fuzzy logic-based sensor validation developed here involves identifying the faults present in a signal based on the statistical signatures and later characterising the detected anomalies based on the fault type(s) associated with them (see Figure 2). The fault characterisation incorporates several fuzzy inputs. These inputs originate from various statistical signatures including the Normalised Root-Mean-Squared (NRMS) value, the skewness, the kurtosis, the crest factor, and the zero-crossing rate. Each of these statistical signatures is fuzzified; for example, Figure 3 shows the membership functions for the three fuzzy sets – platykurtic, mesokurtic and leptokurtic – of the kurtosis.

64

K.E. Holbert, A. Mishra and L. Mili

Figure 2

Block diagram of the FDI block and the fuzzy logic anomaly characterisation modules

Sensor output Calculation of statistical quantities Fault detection and isolation Fuzzy logic jump classifier

Fuzzy logic spike classifier

Jump status

Spike status

Anomaly characterisation Fuzzy membership functions for the kurtosis

Membership

Figure 3

Mesokurtic

1 Platykurtic 0

−0.75

Leptokurtic

−0.35

0.35

0.75

Kurtosis

2.1.2 Cumulative summation method A lesser-known method for anomaly characterisation is the cumulative sum (cusum) approach to detect a change in the mean (Morgenstern et al., 1988). Let {xi} be a sequence of independent Gaussian variables given as: xi = xi + ei

(1)

where ei is the Gaussian noise and ⎧⎪ x0 xi = ⎨ ⎪⎩ x1

for i < r for i ≥ r

(2)

where r is the instant when a jump occurs. The cusum method uses two sequences to detect increases or decreases in the mean. Let Vm be the range of values that the mean could take, and H is the decision interval given by: H=

−σ ln(α ) Vm / 2

where α is the false alarm probability.

(3)

Intrusion detection through SCADA systems

65

To detect an increase in the mean, compute: U0 = 0 k

U k = ∑ ( x j − x0 − Vm / 2)

(4)

j =1

U min =

min 0< j H

(5)

Similarly, to identify a decrease in the mean value, calculate: S0 = 0 k

Sk = ∑ ( x j − x0 + Vm / 2)

(6)

j =1

Smax = 0 H

(7)

The cusum approach can be utilised to determine spikes, jumps and sinusoidal faults (Morgenstern et al., 1988). However, in our work the cusum variable is only employed to detect faults due to jumps. The stochastic gradient algorithm (Das and Hunt, 1998) is another useful tool for detecting jumps. The stochastic gradient finds the uncorrupted signal, given the signal corrupted with noise. However, other experimental results showed that the cusum algorithm outperforms this method (Ananthanarayanan and Holbert, 2003).

2.1.3 Fuzzy logic fault characterisation modules Forethought was given to whether to create an overall comprehensive fuzzy logic anomaly characteriser, or to develop a separate fuzzy logic module for each anomaly type. As the latter focuses upon a single anomaly, it was anticipated that development and management of each rule base would be easier. The drawback to single modules is that cross-talk (that is, information, analysis and decision sharing) is prevented. This disadvantage is overcome with an overall combined fuzzy logic system that simultaneously examines the presence or absence of all fault types; however, the rule set of the comprehensive system could become large and cumbersome to manage. The single unit approach also has the advantage of being able to easily insert additional and possibly new failure characterisation modules, for example, a unit to characterise noise faults (Holbert and Upadhyaya, 1990). As a feasibility study, we chose to concentrate on characterising faults into only two broad categories: 1

jumps

2

spikes.

66

K.E. Holbert, A. Mishra and L. Mili

Separate spike detection and jump detection stages use fuzzy logic to identify and then characterise the faults (see Figure 2). Although this research performed FDI where a signal consists of only spike or jump type anomalies, in reality other types of faults exist, for example, cyclic faults, stuck faults, and bias faults. The fuzzy logic approach can be extended to those fault types too. Fuzzy logic jump classifier The inputs to the jump classifier module are the zero-crossing rate, cusum and NRMS, which then characterise the nature of a jump fault, if present. For jump detection, the output status variable is either ‘jump’ or ‘no jump’. Due to its sophistication, the cusum variable is given greater weight than the other two inputs. The jump classifier consists of just four rules, an example of which is: If the zero-crossing rate is normal and cusum is normal, then the output status is no jump. Fuzzy logic spike classifier The statistical quantities used for spike characterisation are the NRMS, skewness, kurtosis, crest factor, and zero-crossing rate. The output status for the spike detection is ‘spike’ or ‘no spike’. The fuzzy logic spike classifier module uses 72 rules to determine spike faults. The following is one example of those 72 rules, shown for better understanding: If the signal is positively skewed and is leptokurtic and has a high NRMS, then the output status is spike. This condition shows that despite the values of zero-crossing rate and crest factor, the remaining inputs can determine the signal status. Combined FL jumps and spike classifier After implementing the spike and jump classifiers as separate fuzzy systems, an overall FDI system encompassing both spike and jump characterisation was developed. In this case, the set of inputs consists of all the statistical quantities previously used. The number of rules developed for the overall detection system is 36. These rules were formed based on the same assumptions made as in the individual classifiers. An example rule to illustrate this is: If the signal is positively skewed and leptokurtic, and the crest factor is very high, and the cusum indicates the presence of jump(s), then (despite the values of zero-crossing rate and NRMS) the output status indicates the presence of both jumps and spikes. Signals with spikes and jumps such as those shown in Figure 4 were analysed. For the spike anomaly characterisation module, 72 fuzzy rules were developed; whereas the jump classifier required just four rules and is therefore simpler to explain. Figure 5 illustrates the Mamdani fuzzy inference process for the jump characterisation. The four rows correspond to the four rules; the four columns match up to the three fuzzy inputs and one fuzzy output. In this example, the zero-crossing rate, NRMS, and cusum take the values 0.295, 0.0342 and 0.75, respectively. The results show that only the fourth rule is activated. Results aggregation and defuzzification yield a final crisp value of 7.63, which correctly signifies the presence of a jump.

Intrusion detection through SCADA systems Figure 4

Examples of analysed signals

130

112

125

110 108 106

115

Signal

Signal

120

110

104 102

105

100

100 95

67

98

0

5

10 15 Time (sec)

20

96

25

(a) Signal with jump Figure 5

0

5

10 15 Time (sec)

20

25

(b) Signal with spikes

Mamdani fuzzy inference process for jump classifier module

Jump Classifier Rule

1 2 3 4 0 0.6 Zero Crossing Rate = 0.295

0

1 Normalised rms = 0.0342

0

1 Cumulative Sum = 0.75 0 10 Jump Output Status= 7.63

A comparison is made between the two individual fault classification systems and the overall fuzzy logic fault classifier. Although all six inputs are used in the overall fuzzy logic system, the zero-crossing rate and NRMS are not used for the detection of spikes – they are employed only in the case of jumps. This has significantly, and unexpectedly, reduced the number of rules in the overall fuzzy logic knowledge base. Data analysis has shown the overall system to be just as effective as the individual classifiers. Further, the overall system requires about half of the number of rules (i.e., 36 versus 76), and hence it performs the analysis quicker.

68

3

K.E. Holbert, A. Mishra and L. Mili

Intrusion detection through SCADA using power system state estimation

Consider an electric power system subjected to intrusions, destructive natural phenomena, and component wear and tear. The instrumentation provides detectable variations in response to the above hindrances. In the case of a change in an offset setting, possibly due to a disgruntled employee, the fuzzy logic system could detect such an occurrence as a jump. Whenever a system is subject to a momentary transient such as a lightning strike, fuzzy logic classifies this as a spike. Finally, in the case of mechanical faults (such as a loose electromechanical contact), a change in the zero-crossing rate can indicate such a malfunction. These potential faults assert the applicability of fuzzy logic fault based monitoring and diagnosis. The sensory network and control systems are strategic components of major infrastructures, especially electric power systems. To ensure their validity, SCADA-based state estimation methods have long been used in the energy management systems of power networks as a means to detect, identify, and suppress grossly erroneous measurements, and possibly parameter and topology errors (see for example Monticelli (1999), Choi and Malek (1988) and Mili et al. (1996)). However, a reliable estimation of the power system operating states in the presence of inaccurate measurement data remains a challenge for real-time system monitoring. To further improve the performance of the measurement cross-validation procedures of a state estimator, additional information needs to be incorporated into sensor FDI schemes. Presented herein is a technique that employs fuzzy logic to fuse diverse data into sensory network assessment. Specifically, to enhance the state estimation model currently in use, we consider utilising information that is normally available to the operator, such as historical usage trends, weather, and system and component reliability data (Lin and Holbert, 2005). However, through inclusion of this additional information, the engineering interpretations become highly subjective and context dependent – a situation for which fuzzy logic is well suited. In our approach, fuzzy logic is utilised to create a Hybrid Fuzzy System (HFS) that aims to combine information from multiple domains in order to detect, isolate, identify, and mitigate threats to the SCADA system of interest. As shown in the block diagram of the HFS displayed in Figure 6, the HFS includes a component that observes the system to learn its normal operating states. Another part of the HFS contains a database of information including component reliability, sensor accuracy and maintenance records. Having ‘learned’ the system, the next part is to integrate fuzzy logic with a state estimator. The knowledge base in a fuzzy system is typically a set of If-Then rules; however, the knowledge base in this research must incorporate additional information (hence the hybrid nature) with the physical relations and operator experience. The additional information (e.g., secondary measurements) is fuzzified. For instance, in Figure 7, each fuzzy set (i.e., high, mid, and low reliability) is characterised by a unique membership function. The significance of fuzzy variables is that they facilitate gradual transitions between states (note the overlap of the fuzzy sets in Figure 7) and, consequently, possess a natural capability to express and compensate for observation and measurement uncertainties. Traditional variables, referred to as crisp variables, do not have this ability. The only condition a membership function must satisfy is that it must vary between zero and one. The function itself can be an arbitrary curve whose shape can be defined as a

Intrusion detection through SCADA systems

69

function that is appropriate from the point of view of simplicity, convenience, speed, and efficiency (Yen and Langari, 1999). For instance, the reliability fuzzy variable could be constructed along the lines of the classic bathtub curve. Figure 6

Block diagram of the proposed HFS Primary Measurements

Secondary Measurements

State Estimation

Other Real-Time Information from Sensors

Fuzzy Logic - Fuzzifier - Inference Engine - Defuzzifier Knowledge Base (If-Then Rules) - Physical Relations - Operator Experience

Database Component Reliability Historical Trends Maintenance Records Sensor Accuracy

Fuzzy membership functions for reliability data

Membership

Figure 7

-

1

0

High

0

Mid

0.2 0.3 0.4 0.1 Sensor Reliability, λ (outages/yr)

Low

0.5

After fuzzification, these sensor reliability data are provided to an array of p fuzzy (If-Then) rules, R = {R1, R2, …, Rp}. The rule set contains the knowledge and is well-formed logical formulae. The inference process is based on these rules. Consider a generalised case, with p fuzzy rules, m input variables and one output. The i-th rule is written as: R i : If (x1 is A1i ) op … op (x j is Aij ) … op (x m is Ami ), then (y is B i ) ,

(8)

where xj and y are the inputs and output, respectively, Aij and Bi are fuzzy variables, and ‘op’ stands for a fuzzy operation such as ‘or’ and ‘and’. With the input data applied, the conclusion of each rule attains a particular membership value. Each of the p fuzzy rules then submits the membership value to the defuzzifier. Finally, a decision is made from the technique by defuzzifying this information. Continuing the earlier example

70

K.E. Holbert, A. Mishra and L. Mili

using reliability: if the reliability of Sensor A is high and the reliability of Sensor B is low, then measurements from A are deemed more reliable, and we choose Sensor A over Sensor B. To develop the methodology of the overall HFS, the rules are created to be system independent. Considering the human approach to validating results from a state estimator, a reasonable approach is to first focus upon individual sensor status before correlating results between measurements. For this reason, the HFS uses a two-stage process, as shown in Figure 8. The first stage applies a uniform set of rules (knowledge) to each sensor by itself. The second stage then takes the first-stage status results from all the measurements and additional information, and it employs knowledge/rules, which are global in nature to reach a final overall conclusion. Figure 8

Two-stage fuzzy logic decision-making process

General Database

Sensor 1

Sensor 2

Stage 1

Stage 1

Sensor 1 Status

Sensor 2 Status

Sensor N

···

Stage 1

Sensor N Status

Stage 2 Overall Conclusion

Tree-structured Database

The data used for the first-stage system includes measurements and information stored in a general database. The fuzzy output (the sensor status) describing each measurement is classified into three categories, termed valid, suspect, and failed. Whereas the first-stage system focuses on separately analysing the status of each sensor, the second-stage system performs overall sensor status evaluation using system-independent intercomparisons between sensor readings and information. In other words, the second-stage system is designed to independently assess and confirm/deny the sensor status as classified by the first-stage system. The information employed for the second-stage system is derived from the general database, which is a tree-structured database as depicted in Figure 9, and is fed by results from the first-stage system as shown in Figure 8. The general database embraces all the information used for the first-stage system. Finally, each sensor output is classified into four groups arranged in order of decreasing dependability of the sensor, namely Valid, Inspection Warranted, Immediate Inspection Required, and Failed.

Intrusion detection through SCADA systems

71

In order to make this approach applicable to power systems with different network topology structures, the second-stage rules are developed with generic, system-independent If-Then clauses. The tree-structured database is a linked list that facilitates the dynamic creation of system-specific rules from general relationships. It provides a physical description of the infrastructure of interest and knowledge of its instrumentation network, including sensor types, locations, manufacturers, model numbers, nearby sensors, and so forth. The tree-structured database is constructed to ensure that the HFS is portable between diverse network topologies and adaptable when either the sensory network or the infrastructure is modified. Figure 9

The tree-structured database containing a physical description of the power system and sensory network System

Measurand Type

Temperature

T1

F1

Flow

P1

Location

Pressure

T2

Station 1

T3

P2

Base

F2

P3

Station 2

T4

F3

Sensors

4

ABFT techniques for power system state estimation

Because our SCADA intrusion detection scheme relies on the state estimation algorithm, we address in this section one aspect of numerical robustness not considered in the specialised literature, namely robustness against errors caused by transients or permanent failures in computer hardware and software.

4.1 Principles of ABFT methods State estimations in power systems involve very tedious computations since large systems can consist of thousands of buses and lines (Monticelli, 1999; Bose and Clements, 1987; Abur and Exposito, 2004). However, computations of that magnitude are always prone to errors caused by transient or permanent failures in hardware and to a certain extent, software failures as well (buffer overflows, etc.). Computationally efficient and accurate results are required, since power system security monitoring and analysis hinge on them.

72

K.E. Holbert, A. Mishra and L. Mili

Algorithm Based Fault Tolerance (ABFT) is a cost effective technique to detect and correct errors caused by permanent or transient failures in the hardware. ABFT is based on encoding the data at the system level in the form of some error correcting or error detecting code and then designing algorithms to operate on encoded input data to produce encoded output data. One of the key advantages of ABFT that makes it very attractive is that it does not require any modifications to the underlying hardware to provide fault tolerance. This technique has already been applied to numerical algorithms, e.g., matrix multiplications, LU factorisations etc. (Huang and Abraham, 1984; Jou and Abraham, 1984); the fast Fourier transforms (Choi and Malek, 1988; Jou and Abraham, 1988), QR factorisation (Reddy and Banerjee, 1990), singular value decomposition (Chen and Abraham, 1986), Laplace equation (Roy-Chowdhury et al., 1996), and the multigrid method (Mishra and Banerjee, 1999; 2003) etc., running on array processors. Later the same set of problems was resolved using parallel algorithms (Balasubramaniam and Banerjee, 1990; Banerjee et al., 1990) running on general-purpose multiprocessors. The ABFT technique is known as Algorithm Based Error Detection (ABED) when applied only for the detection of errors. The ABED technique provides fault tolerance by preserving an invariant before the start and at the end of all computations in an algorithm. It can also preserve an invariant during the computations by slightly modifying the algorithm. Lack of preservation of the invariant during the course of the algorithm signals the presence of errors in the computation due to faults. Since most of these computations are performed on computers that have finite precision, sometimes invariants are not preserved because of round-off errors in the computation of the invariant and in the algorithm, and the ABFT technique can signal the presence of faults even in the absence of any actual hardware faults. This situation is known as false alarms. Invariant computations therefore should be corrected for round-off errors to eliminate or minimise the false alarms by using some kind of tolerance. It has been shown that methods, which compute tolerance using error analysis of the algebraic manipulations in the algorithm, are superior to other methods (Roy-Chowdhury and Banerjee, 1993) that compute tolerance in ad hoc fashion. The error analysis provides upper bounds of round-off errors that can occur in the computation of an expression on a machine with finite precision. The invariants when modified by adding the computed value of the round-off error accumulated in an algorithm totally solves the problem of false alarms for any data set of any magnitude or dimension. The solution of a large set of the order of several thousand simultaneous algebraic equations is still an important issue in numerical linear algebra. Such large systems of equations often arise in load flow studies and the state estimation in the power system as well as when a partial differential equation modelling electromagnetic field, heat conduction or fluid flow is discretised using finite differences or finite elements. Typical methods for solving a system of linear algebraic equations are either direct or iterative. Gaussian elimination and all its variants fall under the category of direct methods. A direct method computes the solution to a finite precision in a finite number of operations. Operation counts for Gaussian elimination vary from O(N3) to O(N2 log N) for N unknowns depending upon the properties of the coefficient matrix and how the sparsity structure of the matrix is exploited. The minimum operation count that can be achieved O(N2) is for the minimum degree algorithm, which is nearly optimal (Golub and Loan, 1987).

Intrusion detection through SCADA systems

73

This section presents an application of ABFT on a coefficient matrix that has been obtained from a state estimation algorithm for a power system that is solved using Gaussian elimination. We use error analysis to derive expressions for the upper bound on round-off errors in the Gaussian elimination algorithm, which are then used to correct the invariants that are used in the checking step. We show that the modified Gaussian elimination algorithm has low computational overhead and is free from false alarms.

4.2 Weighted least squares state estimation in power systems The role of a static state estimator is to provide a complete, coherent, and reliable base-case power flow solution for contingency analysis, and load forecasting function, among others (Monticelli, 1999; Bose and Clements, 1987). The state estimator processes a redundant collection of measurements based on a mathematical model that relates these measurements to the nodal voltage magnitudes and phase angles, which are taken as the state variables of the system. This model is derived from Kirchhoff’s and Ohm’s laws and hinges on several assumptions, which include: •

random measurement errors that are Gaussian and uncorrelated with zero mean and known diagonal covariance matrix, R = diag(σi2)



a balanced three-phase system



no time-skew between the metred values



the exactness of the pi-equivalent models of the lines and transformers of their parameters



a known topology of the network.

The latter is determined by a topology processor from the metred or given status of the circuit breakers of the system equipments such as lines, transformers, capacitors and inductors, Flexible AC Transmission System (FACTS) devices, to name a few. A general state estimation model is given by: z = h(x) + e

(9)

where z is the measurement vector, e is the measurement error vector, and h(x) is a vector-real value function. The latter is derived from the foregoing assumptions; in particular, it depends on the given values for the line and transformer parameters and the given network topology. Many commercial software packages seek a solution to the WLS normal equation: HT R–1 [z – h(x)] = 0

(10)

through the Gauss-Newton iterative algorithm (Abur and Exposito, 2004) expressed as: (HT R–1 H) ∆x(k) = HT R–1 ∆z(k)

(11) (k)

(k)

where H = ∂h(x)/∂x is the Jacobian matrix, ∆z = z – h(x ) is the residual vector, and ∆x(k) = x(k+1) – x(k) is the increment of the state vector at the kth iteration step. The error covariance matrix of the state estimates, Cov(δx) = Σx, and of the estimates associated with the measurements, Cov(δz) = Σz , are respectively given by: Cov(δx) = Σx = (HT R–1 H)–1 T

–1

(12) –1

Cov(δz) = Σz = H (H R H) H

T

(13)

74

K.E. Holbert, A. Mishra and L. Mili

The diagonal elements of Σx and Σz are the variances of the estimate errors. Under the assumptions stated earlier and under the validity of the linear approximation of the model given by Equation (10) around the solution, these variances provide a measure of the accuracy of the estimates; the smaller they are, the more accurate the state estimator solution is.

4.3 ABFT encoded Gaussian elimination procedures The solution of a set of linear algebraic equations of the form Ax = b for a single right hand side b is usually obtained by performing Gaussian elimination to create an upper triangular matrix. The system is then easily solved by back substitution from the last unknown upwards. Here matrix A = ( H T R −1 H ) is given by Equation (11). The basic algorithm (incorporating no error detection features) proceeds as follows. For a system of size m there are m – 1 iterations in the algorithm from 1 to m – 1, during each of which one more column of zeros is introduced into matrix A. Iteration k of the algorithm results in the update of an (n – k) × (n – k) sub-matrix. The update step of the algorithm can be given as: aij( k +1) = aij( k ) − akj( k ) × aik( k ) / akk( k ) , k < i ≤ m, k < j ≤ m

(14)

where aij( k ) represents the element in the ith row and jth column of A at the start of the kth iteration. Error detection capability is introduced into the algorithm by the introduction of row and column checksums, which are described below. Let us denote the ith row of A by rowi. We also denote the column checksum of the jth column of first k – 1 row at the start of the kth iteration by CCA(j k ) and the column checksum of the remaining rows by CCB(j k ) . We denote the row checksum of row i at the start of the kth iteration by RCi( k ) . At the start of the kth iteration of the algorithm, assuming error-free operation of the algorithm for the first k – 1 step, the following invariants are maintained: I1 – RCi(k ) for all rows; ) ) I2 – CCA(k and CCB (k for all columns. j j

The algorithm computes the row checksum of all the rows and partial column checksums. At the start of the first iteration invariants I1 and I2 are correctly maintained for each row and each column. Then at the start of the kth iteration, we first update the tolerances for the row and column checksums for the current iteration using the expressions derived in the next section. We next perform row and column checksum checks on the kth row and column of the results of the previous iteration. Since aik( k ) is known for all rows, the column checksum computation for column k can be easily computed. Then a row checksum check on row k is performed. If both the row checksum and the column checksums pass, it means there is no error in the kth row and kth column. The row checksums are updated in the following manner: RCi( k +1) = RCi( k ) − aik( k ) × RCk( k ) / akk( k ) , ∀i > k

(15)

and the column checksums CCA and CCB as: CCA(j k +1) = CCA(j k ) + akj( k ) , ∀j ≤ k

(16)

Intrusion detection through SCADA systems

75

CCB(j k +1) = CCB(j k +1) − akj( k ) × CCBk( k ) / akk( k ) , ∀j ≤ k

(17)

We then proceed to update its rows of (n – k) × (n – k) sub-matrix using Equation (14). If no errors are introduced in the kth iteration, the start of (k + 1)th iteration results in the preservation of I1 and I2.

4.3.1 Error analysis In the following, we shall provide the derivations of the error expressions for the Gaussian elimination algorithm which is described by Equation (14). We denote by notation err (a) , the upper bound on the absolute error associated with a variable a that results due to a floating-point operation on a. We can therefore write for a variable ν, | ν | ≤ | ν | + | err (ν ) |

(18)

knowing that any floating-point operation z = fl( x ⊕ y) results in an error err ( z ) = ( x + y) δ , where δ ≤ ε = 2 − t for a t digit mantissa (Banerjee et al., 1990). In these derivations, the variables LE(x) and GE(x) denote the local error resulting from the last computation of a quantity x, and the global error or the accumulated error over all steps of computations, respectively. The variable EB(x) is used to store an upper bound on the absolute value of the global error in x. Therefore, we have EB( x ) ≥ GE ( x ). In the derivations that follow, the O(ε2) terms are dropped from the inequalities wherever they arise. Equation (14) leads us to an algebraic expression, which we need to analyse for round-off errors for the Gaussian elimination, and has the form: z* = fl(ν * − w * × x * / y*)

(19)

where, ν * = ν + GE (ν ),… , z* = z + GE ( z ) . Then we have: GE ( z ) ≤ x GE (w) / y + w GE ( x ) / y +

ν ε + 3 w x / y ε + w x GE ( y) / y 2 + GE (ν )

(20)

The derivation of Equation (20) is given in Appendix B. Using Equations (20) and (14), we can show that the recurrence for the upper bound for error in an individual element holds and is given by: EB(aij( k =1) ) ← (| aij( k ) | +3 | aik( k ) akj( k ) / akk( k ) |)ε + EB(aij( k ) )+ | aik( k ) EB(akj( k ) ) / akk( k ) | + | akj( k ) EB(aik( k ) ) / akk( k ) | + | aik( k ) akj( k ) EB(akk( k ) ) /(akk( k ) )2 |

(21)

This expression can be further simplified to: EB(a( k =1) ) ← 4 | akk( k ) | ε + EB(akk( k ) )

(22)

which is arrived at by using the simplification | akk( k ) | ≥ | aij( k ) | in Equation (21). Using Equation (22) and the fact that n – k elements are summed in the kth iteration, the following approximate error expression for the column checksums is obtained: EB(CCB( k +1) ) ← (n − k ) EB(a( k +1) )

(23)

76

K.E. Holbert, A. Mishra and L. Mili

Similarly, one can derive the error bound expression for the row checksum using the procedure described above and it is given by: ⎛ n ⎞ EB( RCi(1) ) ← ⎜ ∑ (n + 1 − i) | aij | ⎟ ε , ∀i ⎝ j =1 ⎠

(24)

These error expressions are used to compute error bounds, which are then added to the invariants to protect the algorithm from reporting false alarms.

5

Simulation results

The HFS together with the ABFD method was applied to various power systems and their performance evaluated. A summary of the simulation results is provided next.

5.1 Application of the HFS to a four-bus test system The application of the HFS to an electric power system is illustrated with a simple example. The four-bus network depicted in Figure 10 is represented with a DC model using the branch reactance’s shown. This model is obtained by linearising Equation (9) around nodal voltage phasors of 1 p.u.∠ 0° while neglecting the series resistances and the shunt capacitances of all the lines. The power system is provided with four sensors, denoted as M, which are measuring real power. The phase angles, {θ i, i = 1,2,3}, are the unknown state variables with θ4 set equal to zero as the reference. Figure 10 Four-bus electric power system with four sensors Bus 1

Bus 2 M 12

M 13

X 12 =0.25 p.u. X 24=0.40 p.u.

X 13=0.50 p.u. M 31

X 34=0.10 p.u.

Bus 3 M3

Bus 4 (Reference) θ4=0

Table 1 lists the power measurements values at a particular time instant along with the measurement error (σ) and sensor reliability (λ). The sensors have accuracies of the same order of magnitude; however, the reliability of the individual sensors more widely varies. The sensor reliability data are fuzzified as was shown in Figure 7.

Intrusion detection through SCADA systems Table 1

77

Power measurements at a particular time instant along with the sensor accuracy and reliability Power measurement (p.u.)

Sensor error, σ (p.u.)

Sensor reliability, λ (outages/year)

M13

0.705

0.01

0.45

Sensor M31

0.721

0.01

0.15

M12

0.212

0.02

0.21

M3

0.920

0.015

0.10

For the linear DC model, the WLS state estimation solution is given by: xˆ = ( H T R −1 H ) −1 H T R −1 z

(25)

where: ⎡ 2 0 −2 ⎤ ⎢ ⎥ −2 0 2⎥ H=⎢ ⎢ 4 −4 0 ⎥ ⎢ ⎥ ⎣ −2 0 12 ⎦ ⎡10 −4 0 ⎢ 0 10 −4 R=⎢ ⎢ 0 0 ⎢ 0 ⎣⎢ 0

(26)

0 0 4 × 10 −4 0

⎤ ⎥ 0 ⎥ ⎥ 0 ⎥ 2.25 × 10 −4 ⎦⎥ 0

(27)

The weighted least squares estimate is then calculated as: ⎡ 0.0872 ⎤ ˆx = ⎢ 0.0342 ⎥ ⎢ ⎥ ⎢⎣ 0.0912 ⎥⎦

(28)

After performing state estimation, the chi-square test shows that the sum of the squared residuals is greater than the threshold value of 6.64 when the significance level (α) is equal to 0.01. The normalised residuals of M13, M31, M12 and M3 are 100.83, 100.83, 0 and 0, respectively. From these standardised errors, M13 and M31 both have bad measurements; however, the simultaneous removal of these two measurements would make the system unobservable so that state estimation would fail. To avoid state estimation breakdown, the sensor reliability data are taken into consideration. In general, if the reliability of Sensor A is high and the reliability of Sensor B is low, then measurements from A are deemed more reliable, and we choose Sensor A over Sensor B. In this particular case, if λ31 is less than λ13, measurements from M31 are believed more reliable, and chosen over M13. Therefore, M13 can be removed, and state estimation can be performed again to obtain the best estimates.

78

K.E. Holbert, A. Mishra and L. Mili

5.2 Performance assessment of the ABED method The Gaussian elimination algorithm described in Section 4 was implemented on a SUN Enterprise server for experimental evaluation. The error coverage results for the ABFT are obtained by fixing the range and the size of the data sets and then varying one parameter while fixing the others as constants. The results reported here are the error coverage, significant error coverage for significance levels of two and ten, and the error acceptance level to display the robustness of algorithm. Here significance level γ is defined as the computation error in which the norm of deviation of the faulty result from the exact result is bounded by γ times the norm of deviation of fault–free results from exact results. The Error Acceptance Level (EAL) of a test is defined as the smallest value of the significance γ that results in 100% error detection by the test. These comparisons are performed for transient bit-level floating addition errors, which are expected to have the lowest coverage of all classes of errors. We report results on timing overhead of this scheme as compared to the basic Gaussian elimination algorithm with no checks. We finally report the error latency for this algorithm for transient bit-level floating-point errors, since these are expected to be the hardest to detect.

5.2.1 Error coverage Tables 2 and 3 contain the error coverage results for the Gaussian elimination algorithm. Both tables contain the following columns representing the percentage-computed values: The FA (False Alarm), EC (Error Coverage), SEC (2) (Significant Error Coverage – Significance Level 2) and SEC (10) (Significant Error Coverage – Significance Level – 10). The tolerance values used in these results originated from the error analysis that was presented earlier. Table 2 shows the error coverage results with respect to the data range. The error coverage varies from 80% to 91% for a data range of 0.1 to 10000, respectively. This low variation suggests that the scheme is very robust against the data range variation. Table 2 Data range

Percentage error coverage on varying range of data for transient bit-level errors in floating point additions FA

EC

SEC (2)

SEC (10)

EAL

0.1

0

80

87

97

17.05

1

0

79

90

98

8.17

10

0

84

91

99

5.37

100

0

86

93

100

3.73

1000

0

90

94

99

5.60

10000

0

91

96

100

2.38

Intrusion detection through SCADA systems Table 3 Matrix size

79

Percentage error coverage on varying data size for transient bit level errors in floating point additions FA

EC

SEC (2)

SEC (10)

EAL

30

0

84

91

100

3.58

60

0

86

93

100

3.73

100

0

86

97

100

2.51

125

0

84

94

99

2.50

250

0

84

90

99

2.30

500

0

84

90

99

2.25

Table 3 shows the variation of error coverage with the change in the size of the data set, other parameters being held constant. The error coverage and error acceptance levels do not show much variation with data set size either. From the results contained in Tables 2 and 3, it is clear that the scheme is robust which achieves high error coverage and low error acceptance levels across data sets of widely varying ranges and sizes. Significant error coverage levels are in the range of 87% to 96%. Error acceptance levels are around 20 in the worst case, while error acceptance levels of around ten are common. Please note that an error acceptance level of 100 would correspond to an error of a fraction of 1% in the final results for almost all applications. The scheme presented in the paper is not observed to give any false alarms as seen from column two of Tables 2 and 3. Note that the coverage results for memory errors as well as truly gross errors, such as word level errors and control flow errors, were found to be always 100% but these are reported elsewhere (Mishra et al., 2004).

5.2.2 Timing results For any proposed error detection algorithm, it is desired that the error detection latency be as small as possible. The error latency is defined as the time taken between the appearance of an error and its detection. One is interested in devising error detection schemes with low error latency in order to minimise data corruption. The error latency is calculated in two ways. First, error latency is reported as the number of checks, which are passed without the error being detected after the injection of the error. This is achieved by setting a flag when the error is first injected in a run and incrementing a counter every time a check is passed thereafter with the error not being detected. An error may be detected in subsequent checks even if it was not detected in the first check following its insertion due to error magnification, propagation, or due to the occurrence of additional errors. Second, error latency is also reported as a percentage of a total runtime. From Figure 11, we note that overhead incurred in introducing checks into basic algorithm can be amortised by going to larger matrix sizes. One can see from these figures that the overhead decreases with the increase in the problem size. This is not unexpected since the check steps involve O(N2) operations while the basic algorithm required O(N3) operations for a system of size N. Thus, the additional overhead caused by introduction of the check steps is not of concern for large problem sizes.

80

K.E. Holbert, A. Mishra and L. Mili

Figure 11 Time overhead of the current scheme over algorithm with no checks

70 60

% Time Overhead

50 40 30 20 10 0 125

200

250

300

500

Matrix Dimension

5.2.3 Error latency Figures 12 and 13 indicate the error latencies for the Gaussian elimination application for transient bit-level floating-point errors for various bits in error starting with the first bit whose error was consistently detected. As the erroneous bit position becomes more and more significant, the error latency decreases until beyond a certain bit, all bit level errors are detected in the first check following the error. For larger problem sizes, error latency for errors in more significant bits corresponds to a smaller percentage of the total run-time since the error is detected at the first check following its occurrence. Since there is a check following each iteration, and for larger problem sizes the execution time for each iteration is a smaller fraction of the total runtime, the error latency also decreases with matrix size for errors in more significant bits. But, this need not be true for errors in less significant bits since it was often observed that the latency in terms of number of checks passed following error injection tended to decrease with problem size. However, the figures indicate that errors in more significant bits were detected in the first check following error injection and the latency amounted to no more than 5% of the total runtime. Error latency studies for word level floating point and all other classes of errors injected were also conducted and in all of our simulation runs, we found that these errors were detected at the very next check following their injection.

Intrusion detection through SCADA systems

81

Figure 12 Error latency as number of checks passed before detection

60 250x250 300x300 500x500

50

Number of Checks Passed

40 30 20 10 0 6

7

8

9

10

Bit in Error

Percentage of Run Time

Figure 13 Error latency as percentage of total runtime

40

250x250

35

300x300

30

500x500

25 20 15 10 5 0 6

7

8

9

10

Bit in Error

6

Conclusions and future prospects

We have presented two diverse approaches to reducing the vulnerability of SCADA systems in electric power networks. The first method utilises a fuzzy logic-based state estimator to detect, characterise and mitigate instrumentation anomalies. The impetus for using fuzzy logic is its ability to transform information back and forth between the linguistic and numeric domains. The second technique employs algorithm based error detection for the Gaussian elimination method.

82

K.E. Holbert, A. Mishra and L. Mili

Innovative sensors and sensor systems are necessary to monitor the health of vital infrastructures and their components. It is important to identify which sensors are the most critical in the system monitoring. Reduction in instrumentation system susceptibility to adversarial agents may also be achievable through advances in sensing technology, including fibre optic based systems, microelectromechanical devices, and smart sensors which utilise integrated signal processing for self-calibration, self-diagnostics and two-way communication. Endowing the power system state estimation algorithms with some degree of numerical and statistical robustness is critical. In this paper, the ABED scheme was proposed for application to the Gaussian elimination method as being used in state estimation and power flow calculations. We derived the error expressions for the roundoff errors in the algebraic processes within the elimination procedure and modified the invariants for checking by taking into account the accumulated round-off errors. Our results suggest that the algorithm provides error detection at low costs (less than 5% for 500 unknowns), excellent error coverage (98% to 100%) for floating point arithmetic in the presence of permanent bit and word errors. Moreover, the implemented algorithm is insensitive to the data range and the data size and reports no false alarms.

Acknowledgement The authors acknowledge the support of the US National Science Foundation and Office of Naval Research through grant ECS-0224701.

References Abshier, J. and Weiss, J. (2004) ‘Securing control systems: what you need to know’, Control, Vol. 17, No. 2, pp.43–48. Abur, A. and Exposito, A.G. (2004) Power System State Estimation: Theory and Implementation, New York: Marcel Dekker. Ananthanarayanan, V. and Holbert, K.E. (2003) ‘Fuzzy logic to detect and characterize signal jump and spike anomalies’, Proc. of the Thirty-fifth Annual North American Power Symposium (NAPS), Rolla, MO, October, pp.248–255. Ananthanarayanan, V. and Holbert, K.E. (2004) ‘Power system sensor failure detection and characterization using fuzzy logic’, Proceedings of the Seventh IASTED International Conference on Power and Energy Systems, Clearwater, FL, 28 November – 1 December, pp.291–296. Balasubramaniam, V. and Banerjee, P. (1990) ‘Tradeoffs in the design of efficient algorithm-based error detection schemes for hypercube multiprocessors’, IEEE Trans. Softw. Eng., Vol. 16, No. 2, pp.183–196. Banerjee, P., Rahmeh, J.T., Stunkel, C., Nair, V.S., Roy, K., Balasubramaniam, V. and Abraham, J.A. (1990) ‘Algorithm-based fault tolerance on a hypercube multiprocessor’, IEEE Trans. Comput., Vol. 39, No. 9, pp.1132–1145. Betta, G. and Pietrosanto, A. (2000) ‘Instrument fault detection and isolation: state of the art and new research trends’, IEEE Trans. on Instrumentation and Measurement, Vol. 49, No. 1, pp.100–107. Bigham, J., Gamex, D. and Lu, N. (2003) ‘Safeguarding SCADA systems with anomaly detection’, Lecture Notes in Computer Science, Vol. 2776, pp.171–182.

Intrusion detection through SCADA systems

83

Bose, A. and Clements, K.A. (1987) ‘Real-time modeling of power networks’, Proceedings of the IEEE, Vol. 75, No. 12, pp.1607–1622. Byres, E. and Lowe, J. (2005) ‘Insidious threat to control systems’, InTech, Vol. 52, No. 1, pp.28–31. Chen, C-Y. and Abraham, J.A. (1986) ‘Fault-tolerant systems for the computation of eigenvalues and singular values’, Proceedings of the SPIE, Vol. 696, pp.228–237. Choi, Y-H. and Malek, M. (1988) ‘A fault-tolerant FFT processor’, IEEE Trans. Comput., Vol. 37, No. 5, pp.617–621. Cox, E. (1992) ‘Fuzzy fundamentals’, IEEE Spectrum, Vol. 29, No. 10, pp.58–61. Dagle, J. (2001) ‘Vulnerability assessment activities’, Proceedings of the IEEE Power Engineering Society Winter Meeting, Columbus, Ohio, January–February, pp.108–113. Dagle, J.E., Widergren, S.E. and Johnson, J.M. (2002) ‘Enhancing the security of Supervisory Control And Data Acquisition (SCADA) systems: the lifeblood of modern energy infrastructures’, Proceedings of the IEEE Power Engineering Society Winter Meeting, New York, NY, January, p.635. Das, M. and Hunt, T. (1998) ‘A comparative evaluation of various methods for detection of small spikes in data’, Proc. of the Midwest Symposium on Circuits and Systems, Notre Dame, IN, August, pp.510–513. Denzer, D.P. (1997) ‘Technology allows SCADA integration into wide network’, Pipe Line and Gas Industry, February, Vol. 80, No. 2, pp.43–46. Ebata, Y., Hayashi, H., Hasegawa, Y., Komatsu, S. and Suzuki, K. (2000) ‘Development of the intranet-based SCADA (Supervisory Control And Data Acquisition System) for power system’, Proceedings of the IEEE Power Engineering Society Winter Meeting, January, Vol. 3, pp.1656–1661. Frank, P.M. (1990) ‘Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy – a survey and some new results’, Automatica, Vol. 26, No. 3, pp.459–474. Golub, G.H. and Loan, C.F.V. (1987) Matrix Computations, Baltimore: John Hopkins University Press. Heger, A.S., Holbert, K.E. and Ishaque, A.M. (1996) ‘Fuzzy associative memories for instrument fault detection’, Annals of Nuclear Energy, Vol. 23, No. 9, pp.739–756. Himmelblau, D.M. and Bhalodia, M. (1995) ‘On-line sensor validation of single sensors using artificial neural networks’, Proc. of the American Control Conference, Seattle, WA, June, Vol. 1, pp.766–770. Holbert, K.E. and Lin, K. (2004) ‘Reducing state estimation uncertainty through fuzzy logic evaluation of power system measurements’, Proc. of 8th International Conference on Probability Methods Applied to Power Systems, Ames, Iowa, September, pp.205–211. Holbert, K.E. and Upadhyaya, B.R. (1990) ‘An integrated signal validation system for nuclear power plants’, Nuclear Technology, Vol. 92, No. 3, pp.411–427. Holbert, K.E., Heger, A.S. and Alang-Rashid, N.K. (1994) ‘Redundant sensor validation by using fuzzy logic’, Nuclear Science and Engineering, Vol. 118, No. 1, pp.54–64. Huang, K-H. and Abraham, J.A. (1984) ‘Algorithm-based fault tolerance for matrix operations’, IEEE Trans. Comput., Vol. C-33, pp.518–528. Isermann, R. (1984) ‘Process fault detection based on modeling and estimation methods – a survey’, Automatica, Vol. 20, No. 4, pp.387–404. Jou, J-Y. and Abraham, J.A. (1984) ‘Fault-tolerant matrix operations on multiple processor systems using weighted checksums’, Proceedings of the SPIE, Vol. 495, pp.94–101. Jou, J-Y. and Abraham, J.A. (1988) ‘Fault-tolerant FFT networks’, IEEE Trans. Comput., Vol. 37, No. 5, pp.548–561.

84

K.E. Holbert, A. Mishra and L. Mili

Kirsch, H. and Kroschel, K. (1994) ‘Applying Bayesian networks to fault diagnosis’, Proceedings of the Third IEEE Conference on Control Applications, Glasgow, UK, August, Vol. 2, pp.895–900. Klein, S.A. and Menendez, J.N. (1993) ‘Information security considerations in open systems architectures’, IEEE Trans. on Power Systems, Vol. 8, No. 1, pp.224–230. Lemos, R. (2002) ‘E-terrorism; safety: assessing the infrastructure risk’, news.com, 26 August. Lin, K. and Holbert, K.E. (2003) ‘State-of-the-art methods to protect power network sensory systems against intrusion’, Proceedings of the Thirty-fifth Annual North American Power Symposium, Rolla, MO, 20–21 October, pp.537–544. Lin, K. and Holbert, K.E. (2005) ‘Design of a hybrid fuzzy classifier system for power system sensor status evaluation’, IEEE Power Engineering Society General Meeting, San Francisco, CA, June, pp.2414–2421. Martire, G.S. and Nuttall, D.J.H. (1993) ‘Open systems and databases’, IEEE Trans. on Power Systems, Vol. 8, No. 2, pp.434–440. McDonald, J.D. (Ed.) (2003) Electric Power Substations Engineering, Boca Raton: CRC Press. Medida, S.A., Sreekumar, N. and Prasad, K.V. (1998) ‘SCADA-EMS on the internet’, Proceedings of the International Conference on Energy Management and Power Delivery, Singapore, March, pp.656–660. Mehra, R.K. and Peschon, J. (1971) ‘An innovations approach to fault detection and diagnosis in dynamic systems’, Automatica, Vol. 7, No. 5, pp.637–640. Mili, L., Cheniae, M.G., Vichare, N.S. and Rousseeuw, P.J. (1996) ‘Robust state estimation based on projection statistics’, IEEE Trans. on Power Systems, Vol. 11, No. 2, pp.1118–1127. Mishra, A. and Banerjee, P. (1999) ‘An algorithm based error detection scheme for the multigrid algorithm’, Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (FTCS-29), Madison, pp.12–19. Mishra, A. and Banerjee, P. (2003) ‘An algorithm based error detection scheme for the multigrid method’, IEEE Transaction on Computers, Vol. 52, No. 9, pp.1089–1099. Mishra, A., Mili, L. and Phadke, A.G. (2004) ‘Algorithm based fault tolerant state estimation of power systems’, Proceedings of the 8th International Conference on Probabilistic Method Applied to Power Systems, September, Iowa: Ames, pp.174–179. Monticelli, A. (1999) State Estimation in Electric Power Systems – A Generalized Approach, Kluwer Academic Publishers. Morgenstern, V.M., Upadhyaya, B.R. and Benedetti, M. (1988) ‘Signal anomaly detection using modified cusum method’, Proc. of the 27th IEEE Conf. on Decision and Control, Austin, TX, December, Vol. 3, pp.2340–2341. Oman, P., Schweitzer, E.O. and Frincke, D. (2000) ‘Concerns about intrusions into remotely accessible substation controllers and SCADA systems’, Western Protective Relay Conference, Spokane, Washington, October, p.16, http://tesla.selinc.com/techpprs.htm. Padmanabh, A.B., Holbert, K.E. and Heydt, G.T. (2003) ‘The role of sensors in the reduction of power system vulnerability’, Proceedings of the Thirty-fifth Annual North American Power Symposium, Rolla, MO, 20–21 October, pp.552–558. Park, S. and Lee, C.S.G. (1993) ‘Fusion-based sensor fault detection’, Proc. of the IEEE International Symposium on Intelligent Control, Chicago, IL, August, pp.156–161. Patton, R.J. and Chen, J. (1993) ‘Advances in fault diagnosis using analytical redundancy’, IEE Colloquium on Plant Optimisation for Profit (Integrated Operations Management and Control), London, January, pp.6/1–6/12. Patton, R.J., Frank, P.M. and Clark, R. (1989) Fault Diagnosis in Dynamic Systems, Theory and Applications, New York: Prentice Hall. Pollet, J. (2002) ‘Developing a solid SCADA security strategy’, Sensors for Industry Conference (Sicon), Houston, Texas, November, pp.148–156.

Intrusion detection through SCADA systems

85

Reddy, A.L.N. and Banerjee, P. (1990) ‘Algorithm-based fault detection for signal processing applications’, IEEE Trans. Comput., Vol. 39, No. 10, pp.1304–1308. Roy-Chowdhury, A. and Banerjee, P. (1993) ‘Tolerance determination for algorithm-based checks using simplified error analysis techniques’, Twenty-Third Annual International Symposium on Fault-Tolerant Computing (FTCS-23), Toulouse, France, pp.290–298. Roy-Chowdhury, A., Bellas, N. and Banerjee, P. (1996) ‘Algorithm-based error-detection schemes for iterative solution of partial differential equations’, IEEE Transactions on Computers, Vol. 45, No. 4, pp.394–407. Sauter, T. and Schwaiger, C. (2002) ‘Achievement of secure internet access to fieldbus systems’, Microprocessors and Microsystems, Vol. 26, No. 7, pp.331–339. Schwaiger, C. and Treytl, A. (2003) ‘Smart card based security for fieldbus systems’, Proceedings of the IEEE Conference on Emerging Technologies and Factory Automation, Lisbon, Portugal, September, pp.398–406. Simani, S., Fantuzzi, C. and Patton, R.J. (2003) Model-based Fault Diagnosis in Dynamic Systems Using Identification Techniques, London: Springer. Thomesse, J.P. (1998) ‘A review of the fieldbuses’, Annual Reviews in Control, Vol. 22, pp.35–45. Waggoner, D. (2003) ‘No solid clues in 1995 fatal Amtrak wreck’, Arizona Republic, 13 December, http://www.azcentral.com/specials/special14/articles/1213coldcase13.html. Williams, R.I. (2003) ‘New threats prompt renewed security scrutiny for product storage sites’, Oil and Gas Journal, Vol. 101, No. 9, pp.52–58. Yen, J. and Langari, R. (1999) Fuzzy Logic: Intelligence, Control, and Information, New Jersey: Prentice Hall. Zadeh, L.A. (1965) ‘Fuzzy sets’, Information and Control, Vol. 8, No. 3, pp.338–353. Zadeh, L.A. (1988) ‘Fuzzy logic’, Computer, Vol. 4, No. 4, pp.83–93. Zecevic, G. (1998) ‘Web based interface to SCADA system’, Proceedings of the International Conference on Power System Technology (POWERCON ’98), Beijing, China, August, pp.1218–1221.

Appendices A

Fuzzy logic

Fuzzy logic is an artificial intelligence tool that has been used in the past decade for many control applications. FL emerged from fuzzy set theory founded by Zadeh in 1965 (Zadeh, 1965) by challenging basic assumptions of these theories: sharp boundaries in classical set theory, classical logic that each proposition must be either true or false, and additivity in classical measure theory, particularly probability theory (Zadeh, 1988). Unlike classical logic systems, FL aims at modelling the imprecise modes of reasoning, which is the human ability to make a rational decision when information is uncertain and imprecise. Fuzzy logic is considered an extension of Boolean logic where instead of representing the condition with either true or false, a range of values is chosen. The inputs and the outputs are represented using membership functions similar to set theory. FL starts with the concept of a fuzzy set. A fuzzy set is a set without a crisp, clearly defined boundary. It can contain elements with only a partial degree of membership. Membership criteria are not precisely defined for most classes of objects normally encountered in the real world.

86

K.E. Holbert, A. Mishra and L. Mili

A fuzzy set F is characterised by a membership function, µf, that takes values in the interval [0, 1], such that the nearer the value of µf(x) to unity, the higher the membership grade of x in F (Heger et al., 1996). A FL system is comprised of five basic elements. First, a fuzzifier is responsible for mapping the crisp inputs from the system into fuzzy sets modelling the inputs. The second element is the knowledge base, which incorporates the required information about the system in the form of fuzzy If-Then rules. The rules are governed by the relationships between the inputs and the way the inputs combine to produce the desired output. The third element is the fuzzy model, which is the group of fuzzy sets describing each of the input and output variables. The fuzzy sets partition the universe of discourse of a given input or output variable into a group of overlapping fuzzy sets. The fourth element is the fuzzy inference system, which is the reasoning process through which the fuzzified inputs are used to activate the relevant rules. The last element is the defuzzifier, which is the mechanism by which the fuzzy input set is converted into a single output value or control parameter. It is appropriate to use FL when (Cox, 1992): •

one or more of the variables are continuous and are not easily broken down into discrete segments



a mathematical model of the process does not exist, or exists but is too difficult to encode



a mathematical model of the process exists but is too complex to be evaluated fast enough for real-time operation



high ambient noise levels are expected in the input signals, and/or



engineering interpretations become highly subjective and context dependent.

B

Error expression for the Gaussian elimination algorithm

Let us derive Equation (20). To this end, let us denote: (i) t1 = fl(w * x*) (ii) t2 = fl(t1 / y*) (iii) t3 = fl(v * − t2 )

Let us simplify case (i) first:

{

}

t1 = fl(w * x*) = fl ⎡⎣ w + GE (w) ⎦⎤ ⎣⎡ x + GE ( x ) ⎦⎤ = ⎡⎣ w + GE (w) ⎤⎦ ⎡⎣ x + GE ( x ) ⎤⎦ (1 + δ1 )

= w x + x GE (w) + w GE ( x ) + GE (w) GE ( x ) +

δ1 ⎡⎣ w x + x GE (w) + w GE ( x ) + GE (w) GE ( x ) ⎤⎦ = w x + x GE (w) + w GE ( x ) + δ1 w x

upon ignoring the second order terms. Similar simplifications of (ii ) and (iii ) yield: t2 = ⎡⎣ w x + x GE (w) + w GE ( x ) + δ1 w x − (w x / y) GE ( y) + δ 2 w x ⎤⎦ / y

Intrusion detection through SCADA systems and z* = z + GE ( z ) = fl(ν * −t2 ) = ⎣⎡ν + GE (ν ) − t2 ⎦⎤ (1 + δ 3 ) = ν + GE (ν ) + ν δ 3 + (w x / y) δ 3 − ⎡⎣ w x + x GE (w) + w GE ( x ) + δ1 w x − (w x / y) GE ( y) + δ 2 w x ⎤⎦ / y

Since z = v − wx y , we can further simplify the expression for z* and write it as: GE ( z ) = − x

GE (w) GE ( x ) wx w x GE ( y) −w +ν δ 3 − + GE (ν ) ( δ1 + δ 2 + δ 3 ) + y y y y2

Since | δ i |< ε , the last expression can be simplified to: GE ( z ) =

x GE (w) w GE ( x ) 3w xε w x GE ( y) + +ν ε+ + + GE (ν ) y y y y2

which is identical to Equation (20).

87

Suggest Documents