Copyright 2002. Published in the Proceedings of the Advanced Simulation Technologies Conference (ASTC), San Diego CA USA, April 2002, Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the Society for Computer Simulation. Contact voice: (858) 277-3888; fax: (858) 277-3930 ; E-mail:
[email protected] or write SCS, 4838 Ronson Court, Suite L, San Diego, CA. USA 92111-1800.
On Integrating Human-In-The-Loop Supervision Into Critical Infrastructure Process Control Systems
Alex Korzyk Department of Business & Center for Secure Dependable Systems University of Idaho
[email protected]
Keywords: human-computer interaction, infrastructure protection, process-control engineering Abstract: Post mortem investigations report that most industrial accidents are due to human error with many having their origin in the lack of effective control system interfaces for human supervision of embedded sensors. The need for more sophisticated human interface facilities based on human usability and human performance design is most pressing. This paper examines this problem as a necessary component in systems currently being designed to protect critical national infrastructures. INTRODUCTION One of the results of the Y2K bug non-event was the realization that 98% of all processors are not in traditional computer platforms [3]. The focus on traditional computer platforms (PCs, mainframes, laptops, PDAs) has discounted the role of ubiquitous embedded processing systems. The division between man and machine so apparent two decades ago continues to blur with the sale of embedded computing devices dwarfing traditional computer sales (by two orders of magnitude). Driven by decreasing costs and increasing capabilities, computing and communications technology is being embedded into a growing range of devices linked together with networks [2]. The information from these embedded devices or sensors will allow information to be collected, shared, and processed in unprecedented ways. Broad applicability gives these systems the potential to transform the way people interact with and manage systems. Another result of the Y2K bug non-event was the inspection and review of virtually all commercial software, with the USA spending 10 times what the rest of the world combined spent to address this issue. These code reviews and potential fault scenarios brought the following realizations: (1) all critical systems rely on underlying computer control, (2) survivability has not been designed into critical systems, (3) the impact of fault events on critical systems has been dramatically increased due to concentration based on economies-of-scale, and (4) fault events can cascade from one critical system in a particular domain to another critical systems in a different domain due to interdependencies. Fault events can be either a result of a random reliability failure or a deliberate malicious attack. Specifically, the President designated the following critical systems as national critical infrastructures: the electric power grid, the public-switched
William Yurcik Department of Applied Computer Science & Survivability–Over-Security (SOS Group) Illinois State University
[email protected]
telephone network, water supply systems, banking and financial systems, transportation systems (air, land, and water), oil and gas production and distribution systems, emergency services, and government continuation services [17]. Elaborate detection monitoring embedded systems are being designed and deployed for these national critical infrastructures with the goal of detecting faults quickly in order to respond and recover. However, there has been little or no research into improving the performance and survivability of human supervision of embedded sensors. Where ten operators used to monitor a critical system, only one may be left with no upgraded processing capabilities. Some control system interface facilities that were originally designed to be monitored by humans have now been completely automated. The remainder of this paper is organized as follows: Section 2 gives empirical evidence of the need for more sophisticated human interface capabilities based on power grid problems. Section 3 summarizes previous work into system support of human supervision of embedded sensors. In Section 4 we propose a model-based interface architecture including repository and display capabilities. We close with a summary, conclusions, and directions for future research directions in Section 5. CASE STUDY: NORTH AMERICAN POWER GRID The recent problems of the electric power grid provide empirical evidence to support our hypothesis of the need for enhanced human supervision of embedded sensors. The privatization of electric utilities resulted in the management and operation of new functionality within electric grid without corresponding investment in new control systems. For instance, new functionality includes the ability of one power region to decide when and how much electricity to purchase from another region in a spot auction market in the event of a power outage during a peak demand period. For instance, in the summer of 2001 California utility companies purchased electricity from Nevada, Oregon, and Washington to meet demand. In some cases widespread outages occurred due to aging control systems unable to meet this new functionality [5]. This is not only a recent problem exacerbated by deregulation, the “Great America West Blackout” on August 10, 1996 resulted from computerized control system (human-out-ofthe-loop) cascading effects of a minor local excess load problem in Oregon [1]. The fragility of the power grid to minor faults is exemplified in this failure because it “blacked-out” 11 states and 2 Canadian provinces (3 million square miles), 100 million people,
and caused $1.5 billion in damages [4]. In more detail, lower level operators became aware of a problem caused by automated control systems and reacted by sending more power (instead of less) into Oregon. The Oregon electric grid chief operator had no idea he could override this errant command from lower level operators sending more power (instead of less) into Oregon and the result was a huge power surge (500+ Megawatts). The local regional grid could not contain this power surge eventually leading to widespread catastrophic power failures. It is open question whether an improved human-in-the-loop interface facility would have reduced or prevented this failure and its cascading effects but poor human supervision of embedded sensors was designated as one of the contributing factors [8]. PREVIOUS WORK The monitoring controls of large complex systems resemble chess pieces that can positioned to counter a threat, possibly preempting another threat, and possibly preventing a third threat. This introduces the concept of strategy to advanced control systems. With the shift from single function sensors to network centric computing, removing a human-in-the-loop as the ultimate system operator will require advanced changes to the current control facilities [15]. Prior to digital electronics, control rooms had analog gauges that provided warnings with colored lights. With the advent of digital electronics, control rooms became walls of screen displays stacked horizontally and vertically for humans to scan, with each screen representing the output of a single function sensor. Control rooms have now evolved to large screens with multiple windows. Through the use of hierarchical abstraction, information from dozens of sensors can be fused to a smaller number of windows. Designing the interface facility for a human-in-the-loop still presents problems. First, the human operator may be exposed to different technologies that are continuously changing. Second, the same control system interface will be used around the country or perhaps globally forcing compromises and lack of individual personalization. Third, the human operator may not be trained to properly operate the controls. Fourth, computers process information at speeds orders of magnitude faster than human sensory decision-making [13]. Process control provides a systemic framework for design of sensor data fusion. The design of sensors can be based on the information needs to be collected and the functions to be monitored for each component of the process control system. Process control activities include: [summarized from 18] 1)
System control to assure normal operation through diagnostic control. Diagnostic control monitors system operation for faults/attacks and takes control when needed for a preemptive action or restoration.
2)
System planning to design and specify limits of normal operation based on historic data of inspection and routine maintenance.
When making an analogy between information assurance and process control, attack prevention is analogous to system planning, attack detection/assessment/reaction is analogous to diagnostic
control, and vulnerability testing is analogous to routine maintenance [18]. Conventional process control uses feedback loops requiring a human response between sensor operations. A change in programming to allow the control system software to interact directly with sensors using model-based systems instead of feedback loops may remove a human-in-the-loop. Model predictive control research continues to struggle with solving “the open-loop nature of the optimal control problem and the implicit feedback produced by the receding horizon implementation.”[11] Fusing data from multiple sensors to be presented to a human operator necessitates the need for knowledge discovery. The fused data rapidly becomes a vast source of accumulated information that is intractable to traditional statistical analysis. Knowledge discovery techniques have radically improved in the past decade to analyze and interpret greater volumes of raw data. These techniques will be needed to capture functional aspects hidden within control system fragments. The knowledge discovery technique of data mining, which relies on the use of first generation machine learning algorithms (decision trees/Bayesian networks), may allow combinations of sensor data into new previously unknown classifications [9]. However, data mining is currently a reconstruction activity rather than a proactive decision support tool. The data mining techniques of clustering and anchoring objects provides information to facilitate templates for user interface GUI design based on sensitivity identification of the important factors to monitor. Other techniques are also promising. Genetic algorithms may provide prediction of threat sequences [14]. Dynamic logic programming uses background data from a repository to induce hypotheses that can contribute to control system policies [6]. Both genetic algorithms and dynamic logic programming can use fractal imaging of increasingly finer detail to visualize data. MODEL-BASED INTERFACE FACILITIES In this section we examine different model-based interface facilities. The first type is a consolidative model - providing the operator information about normal operation and highlighted conditions known to be precursors to a possible failure. The second type of model is simulation-based - providing the operator with pre-testing certain parameter corrections made to a sensor reading. Third type of model is consultative – providing operators a recommended course of action based on the judgment of an expert user. These three types of models supplement the capabilities of conventional process control (status reports, gauges, direct sensor data) and approach the limit of interactive or humancentric computing. The next level is machine/computer or network-centric systems based on data fusion of embedded sensors. Network-centric systems use sensors and actuators to react beyond readings to shaping the environment [16]. In the case of monitoring critical infrastructures, many attacks can be prevented since automated response to external stimuli will be several orders of magnitude faster than human reaction times. The human-in-theloop operator will move above to a supervisory role. Networkcentric systems will also be better at responding to critical infrastructure threats [7]. These threats come from hostile parties with the potential to exploit vulnerability to cause damage. There are three types of threats: (1) National; (2) Shared; and (3) Local.
National threats have been defined by PDD 63 as the critical infrastructures considered vital to defense and economic security [17]. The specifically designated 8 national critical infrastructures include: (in alphabetical order) (1) banking and finance, (2) electrical power, (3) emergency services, (4) gas and oil distribution, (5) government services continuity, (6) telecommunications, (7) transportation, and (8) water supply [7]. The Presidential Commission for Critical Infrastructure Protection (PCCIP) has further combined these eight infrastructures into 5 sectors: (1) banking and finance, (2) energy, (3) information and communication, (4) physical distribution, and (5) vital human services [5]. Shared threats include: terrorists, hate organizations, industrial espionage, organized crime, malicious mobile code, and network failures. Local threats include: insiders, vendors/contractors, consultants, institutional crackers, recreational crackers, natural disasters, accidents, and system failures [7]. Most of these threats must be countered in real-time without advanced planning based on previous knowledge. More advanced models are being proposed for future interface facilities. One type of advanced model is integral providing responses for a fully-predicted event (threat is known and counter response provides complete recovery. If an interface control system uses the integral model, there is no need for a human-in-the-loop. A second type of advanced model is categorical - providing responses to a partially predicted event (threats are known within a certain range and responses are planned for this “category” of threats). If an interface control system uses the categorical model, the system provides pertinent information to the human-in-the-loop including possible responses to the category of threat. A third type of advanced models is template-based – providing responses to unanticipated events. The template-based model incorporates background knowledge to form hypotheses that enhance control system policy [10]. If an interface control system uses the template-based model, the system provides basic information to the human-in-the-loop and then corroborates the human decision with automation before acting upon it. Should the template-based control system fail to act upon the human decision then, provided that sufficient time is left in the decision window, a higher-level control authority may override with a new response. If the human-in-the-loop provides no response before the end of the decision window, the template-based control system would select a course of action based on the severity of the threat. Object-Oriented Repository The information about threats and responses will be stored in a threat/response repository as objects. This repository provides information through a graphic console monitored by a human operator. All three advanced model-based interface facilities (integral, categorical, template-based) require relevant information be contained in a common object repository [6]. Relevant information includes the type of threat, intentions, method of attack, tools used, vulnerability exploited, alternative responses, type and location of targets, and event results. Ideally, one distributed database would contain known threats, vulnerabilities, and associated events that exist in shared databases. Many events in this repository will never become public knowledge, some events will become public knowledge only after a public event, and some events will only become public knowledge after information is leaked [15]. The information is sanitized in order to
be shareable. The information in the database can be abstracted into one of the three advanced models (depending on whether the threat is recognized). The objects containing the event information stored in the common object repository also contain scripts of response logic used to counter the threat. If a situation is impending, the system will suggest a preemptive action. The categorical model examines objects and may request the human-in-the-loop to provide missing information to create a response script for a category of threats. When the system does not recognize the threat, it will rely upon policy objects previously stored within the system. Should a human-inthe-loop take an action that puts the system in peril, a corroborative script will interrupt and suggest higher-order control authority. The system will also have stored scripts that execute restorative actions. Should the human-in-the-loop fail to restorative action, a survivability script will take over control of the system. A Markov model by Sutherland executes a survivability script with or without a human-in-the-loop [15]. Graphics and Geographic Display Capabilities Most text-based objects are viewed by a conventional 2dimensional interface. Other systems are limited to the amount of data and the use of a special third party interface. Some objects contain multi-dimensional information warranting a 3-dimensional human interface. The ideal human interface would allow physical tangible sensations to be added to 3-D imagery [12]. This would turn the network of sensors into a surreal environment. Network-centric data may be visualized using geographic data. The fundamental elements include points, lines, areas, and surfaces with functions such as clustering (pointing to an area), buffering (drawing a line around an area), and contouring (points on a surface). Clustering groups threat data into categories. Buffering creates a buffer zone around a category of threats. Contouring shows the density of threat demographic information. Current views are the equivalent of tunnel vision compared to advanced geographic display capabilities – human operators can traverse relevant geographic information visually from treetop level, one mile high, or from space. The use of geographic data in control systems can used be for geography, demographics, and behaviors. Geography includes maps, networks, codes, density, centroids, and boundaries - exploded maps offer more detail based on scale. Demographics include demographic profiles, population density, and demographic forecasts. Behavior includes transactions, surveys, message response, delivery locations, and traffic volumes. Since geographic data is temporal in nature – some criteria must be set to determine the data refresh rate and if older data is sufficient. Examples of temporal data includes subscriber lists, warranty claims lists, and complaint databases. If proprietary information is unavailable, supplemental representative data can be substituted. SUMMARY Human interface facilities are a crucial part of critical infrastructure protection and yet have been a neglected area of research. In this paper we describe model-based human interface facilities to adequately support critical infrastructure process control systems. While the capabilities of embedded sensors are increasingly providing unprecedented opportunity for large volume data fusion, interpreting the result and acting upon it is a
Figure 1. Network Centric Advanced Control System decision-making process currently presented to a human-in-theloop. We address maintaining a human-in-the-loop given advances in embedded sensors and, in summary, to maintain a human-in-theloop will require significant changes to current interface facilities and certain real-time event situations dictate removing the humanin-the-loop altogether in lieu of preplanned scripts. We outline a broad systems-level approach that will be needed to solve the problems presented by human supervision of embedded sensor fusion and describe enabling technologies aimed at achieving functional protection of critical national infrastructures. References [1] Amin, M. “Toward Self-Healing Infrastructure Systems,” IEEE Computer, Vol 33 No 8, Aug. 2000, pp. 44-53. [2] Computer Science Telecommunications Board/ National Academy of Science. Embedded Everywhere: A Research Agenda for Networked Systems of Embedded Computers. National Academy Press, 2001. [3] Estrin, D., R. Govundan, and J. Heidemann. “Embedding the Internet,” Comm. of the ACM, Vol 43 No 5 May 2000, pp. 38-41. [4] Grudinin, N. and I. Roytelman. “Heading off Emergencies in Large Electric Grids,“ IEEE Spectrum, Vol 34 No 4, pp. 43-47. [5] Jones, A. “The Challenge of Building Survivable Information Intensive Systems,” IEEE Computer, Vol 33 No 8, pp. 39-43. [6] Korzyk, A.D. Sr. “Towards Security of Integrated Enterprise System Management,” 22nd National Information Systems Security Conference (NISSC), 1999, pp. 415-431. [7] Korzyk, A.D. Sr. “Towards a Cybernetic Perspective for Enterprise System Security,” 4th Multiconference on Systemics,
Cybernetics, and Informatics, 2000, pp. 72-77. [8] Liu, C-C., J. Jung, G.T. Heydt, V. Vittal, A.G. Phadke. “Strategic Power Infrastructure Defense (SPID) System,” IEEE Control Systems, Vol 20 No 4, Aug. 2000, pp. 40-52. [9] Mitchell, T.M. “Machine Learning and Data Mining,” Comm. of the ACM, Vol 42 No 11, Nov. 1999, pp. 30-36. [10] Muggleton, S. “Scientific Knowledge Discovery Using Inductive Logic Programming,” Comm. of the ACM, Vol 42 No 11, pp. 43-46. [11] Rawliings, J.B. “Tutorial Overview of Model Predictive Control,” IEEE Control Systems, Vol 20 No 3, pp. 38-52. [12] Salisbury, J.K. “Making Graphics Physically Tangible,” Comm. of the ACM, Vol 42 No 11, Nov 1999, pp. 75-81. [13] Schniederman, B. “Universal Usability,” Comm. of the ACM, Vol 43 No 5, May 2000, pp. 85-91. [14] Schulze-Kremer, S. “Discovery in the Human Genome Project,” Comm. of the ACM, Vol 42 No 11, Nov 1999, pp. 62-64. [15] Sutherland, J.W. “Addressing the Shortfall in Process Control Systems,” Intl. Journal of Technology Management, Vol 14 No 6/7/8, 1997, pp. 670-700. [16] Tennenhouse, D. “Proactive Computing,” Comm. of the ACM, Vol 43 No 5, May 2000, pp. 43-50. [17] The White House. “Defending America’s Cyberspace: National Plan for Information Systems Protection,” Critical Infrastructure Assurance Office (CIAO), Washington D.C., 2000.
[18] Ye, N., J. Giordano, and J. Feldman. “A Process Control Approach to Cyber Attack Detection,” Comm. of the ACM, Vol 44 No 8, pp. 76-82.