Development of a Real-Time Data Quality Monitoring System Using Embedded Intelligence Thomas Bethem Michael Evans
Haleh Vafaie, PhD. Mark Shaughnessy
NOAA Ocean Service Silver Spring, MD. 20910 USA
[email protected] [email protected]
Northrop Grumman Information Technology Bethesda, MD. 20814 USA
[email protected] [email protected]
Abstract- Rule-based reasoning and case-based reasoning have emerged as two important and complementary reasoning methodologies in the field of artificial intelligence (Al). This paper describes the development of a real-time data quality monitoring system (CORMS AI) using case-based and rule-based reasoning. CORMS AI was developed to augment an existing decision support system (CORMS Classic) for monitoring the quality of environmental data and information and their respective computer based systems for use in NOAA Ocean Service’s oceanographic operational products.
duty watchstander to identify and consistently view or recognize the potentiality of inaccurate sensor data from the presented information, and to choose the correct series of remedial actions.
I. INTRODUCTION In 1997, the National Oceanic and Atmospheric Administration (NOAA) and the NOAA Ocean Service (NOS) implemented a manned Continuous Operational Real-Time Monitoring System (CORMS) that provides 24 hour a day, 7 day a week quality control monitoring of marine environmental information, data acquisition and ingestion networks, and data dissemination servers. The primary function of CORMS is to ensure the availability and accuracy of real-time data provided by the Center for Operational Oceanographic Products and Services (CO-OPS) that is used to ensure safe, efficient and environmentally sound maritime commerce. The 1997 installation of CORMS (referred hereafter as CORMS Classic) was designed to provide a basic graphical interface for the operator that would visually reflect the quality of the data received at six minute (real-time) intervals from the Physical Oceanographic Real-Time System (PORTS®) [1] and near-real-time from the National Water Level Observation Network (NWLON) [2]. Within these two systems, quality-control checks are performed to flag as flawed those individual data that fit a defined profile. The CORMS Classic system receives these sensor data, flagged and nonflagged and presents this information in graphical and textual format to the duty watchstander. With CORMS Classic, the effectiveness of the system to monitor data quality is dependent upon the relative capacity of the
Generally, the ability of watchstanders to consistently make “’quality decisions” (i.e., accurately assess inaccurate sensor data and follow approved methods and procedures) is directly tied to their level of implicit knowledge and experience. Thus, a distinct variation in the quality of decisions being made between novice and expert CORMS Classic watchstanders has been observed. In addition, other factors, such as fatigue, distractions, and workload tend to detrimentally impact the consistency of all watchstanders in assessing data quality and in choosing the appropriate form of remediation. Currently, the number of PORTS® sites whose data are assessed by CORMS Classic is relatively small (e.g. fewer than 10). However, as the number of sites whose data need to be assessed increases, as it is intended, then it should be expected that unless a system is developed to augment the CORMS Classic system to assist and guide the duty watchstander that the quality of decisions made by all watchstanders will degrade significantly. In addition, CO-OPS is planning an aggressive research and development effort that will result in the incorporation of new sensors and new applications such as the use of numerical models. To continue to add capability to the CO-OPS National PORTS® Program, [3] a robust, efficient and flexible quality control system must be in place to insure the highest quality of data and information for their users. In the following sections the two system’s architectures are briefly described. II. CORMS (CORMS Classic) SOFTWARE AND HARDWARE ARCHITECTURE The purpose of the CORMS Classic system is to enable quality control oversight of network as well as oceanographic sensor data being used within real-time
marine navigation applications. To aid in safe navigation within the ports and harbors across the United States, one or more of the available four types of oceanographic sensors have been deployed at strategic locations within each estuary; water level (WL), current meter (CU) , meteorological (MT), and conductivitytemperature (CT). These sensors provide the pilots with a real-time view of conditions potentially affecting their capacity to safely navigate the local waterway and shipping channels. In providing real-time information valuable to maritime applications, each oceanographic sensor generates a discrete measurement every six minutes. This information is captured in a raw, digital format and is stored until retrieved. Every six minutes, a Data Acquisition System (DAS) located centrally in the estuary queries each instrument and downloads the raw measurement data. The DAS then combines each measurement, transforms the data into a standardized text format, and compresses the data to expedite transmission to a central location in Silver Spring, Maryland. The data are transmitted electronically from each estuary to the central location through a nationwide commercial network. Much of CORMS Classic relies upon visual inspection and interpretation by the watchstanders. CORMS Classic consists of a set of computer workstations and peripheral equipment mounted in an integrated console work area. The system is comprised of two UNIX workstations. One acts as a communications front-end and the other acts as the main display and processor. The communications frontend is a single processor low end workstation. Its purpose is to act as a repository for real-time data files transferred from the various PORTS® sites. The main display is a duel processor, duel head, high-end UNIX workstation. Its purpose is to process the incoming PUFFF (PORTS Uniform Flat File Format) [4] files received by the communications front-end and display the result to the operators via a web browser. The system is using both in-house and Commercial off the Shelf (COTS) software packages. The main display system software components can be divided into three categories; (1) data acquisition and parsing, (2) data control and (3) data display. The data acquisition and parsing code processes the PUFFF files as they arrive. The relevant information from the PUFFF files is extracted and stored in a specific format so it can be easily displayed. This code was developed using shell scripts, FORTRAN and C. The data control component allows the CORMS Classic operator to interact with the data and control the dissemination of the data. This
code was developed using C++ and Perl. The data display component allows the CORMS Classic operator to view the quality control information about the data. This code was developed using HTML, Perl and a COTS plug-in. The plug-in, Glg, allows the CORMS Classic operator to interact with the CORMS Classic interface dynamically to allow for real-time updates and refreshes. The communications front-end does not perform any processing so it has only the operating system installed. III. CORMS (CORMS AI) HARDWARE ARCHITECTURE
SOFTWARE
and
A primary justification for developing the CORMS AI system is to automate and standardize the process of interpreting, from presented sensor data, the existence of network and oceanographic sensor data quality failures. Additional challenges associated with this effort include the increasing scope of instrument types, number, and reporting networks against which the system must provide a consistent application of quality control and network monitoring. As mentioned, CORMS AI has been architected to extend the current capacity of CO-OPS to deliver discretely generated oceanographic sensor data samples to aid the safe navigation of local waterways and shipping channels. Consistent with the CORMS Classic system, six minute data samples from the four types of oceanographic sensors are collected and processed via the data acquisition system (DAS) located at each estuary, and are then transported to a central location in Silver Spring, MD. However, sensor data samples are also collected and transported to the central location via a network based upon the periodic upload of station data to National Weather Service (NWS) satellites (i.e., GOES). The CORMS AI system is designed to accept sensor sample data arriving in these various formats. Data are processed to a common format, real-time quality control flags applied to “flag” sensor samples that exceed defined thresholds and then are loaded into a shortterm Operational Data Store (ODS). The purpose and structure of the ODS is to facilitate and expedite delivery of a sensor’s most-recent sample to all downstream applications serving real-time business uses. After loading into the ODS, but prior to dissemination, an assessment of each sensor data sample is performed using an artificial intelligence application. This application applies a set of expert “rules” to the collection of samples to determine if there is a significant likelihood of a network interruption or data
quality error. Reasoning conclusions are written back to the database, from which alerts regarding the nature of the potential failures and the required actions to be taken by watchstanders are generated. In addition, CORMS AI extends further current capabilities by providing a separate set of functionality which enables case-based reasoning to be applied during the process of troubleshooting the causes of sensor and network failures. In designing the system, CORMS AI leverages as much of the existing CO-OPS systems architecture as possible. The suite of CO-OPS applications that provide data acquisition, parsing, data control and display capabilities under CORMS Classic, also play a major
role within the CORMS AI system. A Windows NT workstation was added to the CO-OPS systems architecture, and provides a platform for testing and deploying the AI software. The preliminary version of the ODS has been implemented in Sybase 12.5 running under Windows NT, but plans are to migrate to Sun Solaris 8 to ensure consistency with other system database implementations. A specialized COTS software package was purchased to enable developing the AI components of the system. ARTEnterprise (a product of MindBox) was selected because of its capacity to support both rule and case-based reasoning. Fig. 1. shows the CORMS system’s high level architecture.
Fig. 1. CORMS System
IV. CORMS AI DEVELOPMENT APPROACH In this section we describe the approach for the development of CORMS AI application using ART Enterprise. An intelligent system has been built that augments the functionality of the CORMS Classic system by delivering decision support guidance and directives to watchstanders when a sensor or network interruption is identified. The domain in which CORMS AI performs its analysis and interpretation includes but is not limited to marine environmental information, data acquisition and ingestion networks, and data dissemination servers. For the data acquisition and ingestion networks as well as data dissemination servers, there is a well-defined set of rules that could be used for problem detection and trouble shooting. Hence, it was determined that implementation of logic for identifying network interruptions incorporating rules is the optimal method. The marine environmental data, as previously described, are collected using various sensors that are strategically located at various estuaries across the United States. The current CORMS Classic application incorporates a sophisticated network of sensor data range criteria, real-time data processing, and a system of data quality flags that provide notification of warnings and potential sensor failures to watchstanders. The information indicated by these flags range from notifying the watchstander about the operation of a sensor, to indicating when an instrument’s measurement has failed a given data quality check for real-time use. Through an interpretation of these presented symbols of sensor data quality, the watchstander must decide whether to take action (or non-action), and if so, the appropriate action they are to take given the nature of the failure. At this point, although there are some guidelines for evaluating the quality of the data reported by a sensor, the final evaluation depends on an expert’s decision. Therefore, it was determined that case-based reasoning approach is the optimal method for the monitoring of the sensor data. A. Rule-based Reasoning A rule-based system represents domain knowledge in terms of a set of rules that indicates the actions and conclusions in different situations. A rule-based system consists of a set of rules, a set of facts, and an inference engine for controlling the application of the rules, given the facts or conditions. Whenever the
conditions in a rule change, the inference engine reevaluates all rules that contain that condition. If the actions taken by those rules impact other rules’ conditions, then those rules are reevaluated. The inference engine offers the pattern matching capability that specifies when a rule should “fire”. This eliminates the need for the complex navigational programming required by non-rule based systems. All of the rules used by the system are stored in a rule base. Rulebased systems have many advantages over the traditional systems. Rule-bases can be easily updated since rules are independent of each other. Because of their IF-THEN structure they are very easily understood by both programmers and experts. B. Case-based Reasoning Case-based reasoning (CBR) is based on the notion that human expertise is not merely comprised of formal structures like rules, but also of experience. An expert reasons by relating a new problem to previous ones [6]. Case-based reasoning makes decisions based on the actions taken in similar problems previously encountered. The decision is based on comparing the current situation to circumstances of past problems, and looking for close matches to determine the best decision to take. The past experiences are stored in case-bases, which are a database of cases. Each case describes the problem and its specific features and values, as well as an appropriate action for that problem. Case-based reasoning has several advantages over reasoning with rules. The main advantage is that it is relatively easy to set up a knowledge base. Experience has shown that it is commonly very difficult to capture knowledge on a problem domain in a set of rules if there are no well defined standard operating procedures. However, common examples of problems in a domain with their solution are either available or could be acquired. Because changing circumstances can be accommodated by adding new cases to the case base, a CBR system can learn from experience. Domain experts can easily maintain the case base, because there’s minimal programming involved, and the CBR system automatically handles contradicting cases. Expanding a rule-based system on the other hand is much more difficult: adding one rule often means rewriting a large part of the rules [6, 7]. Case-based reasoning systems can be built without knowledge acquisition bottleneck, since you can have a case-base by only having a single past experience or case.
V. BENEFITS Deployment of the fully developed CORMS AI system will allow for: C
More timely resolution of data, information, engineering and computer based problems.
C
The ability to monitor more PORTS® and NWLON installations without compromising quality.
C
More consistency in operator responses.
C
A standard set of operating procedures to train new operators.
C
Reduce subjectivity interpretations.
C
The ability to isolate trends and subsequently “teach” the system over time and create new rules and cases based upon history and experience.
in
data
quality
Ultimately the user of CO-OPS real-time products will benefit by receiving high quality data and information that will lead to more economically efficient and safe maritime commerce.
VI. SUMMARY AND CONCLUSIONS An initial prototype has been designed, developed, tested and implemented. This system has shown the feasibility of applying AI techniques to our problem. The level of functionality delivered through this prototype represents an improvement over the current CORMS Classic system through the delivery of decision support guidance and directives outlining the actions (or nonactions) that should be taken as a result. The next step is to evaluate the system for several months and begin to plan for addressing increasingly more difficult quality control issues. It is anticipated that the knowledge base will continue to grow. The case-based component of CORMS AI will also be beneficial during the troubleshooting process. In this situation the previous problems along with the solution are stored in a case which can be retrieved if the same or a similar problem occurs. The system will go through rigorous and thorough evaluation and testing before it becomes operational.
The successful implementation of this system will provide for economically efficient and safe maritime commerce. REFERENCES [1] Appell, G., T. Mero, T. Bethem and G. French, The Development of a Real-Time Port Information System, IEEE Journal of Oceanic Engineering, Vol. 19, No. 2, 1994. [2] National Ocean Service,1991:”Next Generation Water Level Measurement System (NGWLMS) A site Design, Preparation, and Installation Manual,” Office of Oceanography and Marine Assessment, National Ocean Service, January 1991. [3] Real Time Tide and Current Data Systems in United States; Implementation Status Safety and Efficiency Needs Expansion Plan - A Report to Congress. July. [4] Evans, M., G. French, T. Bethem, “PORTS® Uniform Flat File Format (PUFFF),” Center for Operational Oceanographic Products and Services, National Ocean Service, Third Revision, November 1998. [5] Vafaie, H., M. Shaughnessy, T. Bethem, and J. Burton, “Evaluation of High-End Decision Support Tools for a Real-time Monitoring System,” proceedings of SCI2001, Orlando, Fl, July 2001. [6] Tenback, R.H., "A comparison of Similarity Measures for Case-Based Reasoning," Master’s thesis, the Utrecht University,1994. [7] Watson, I, “Case-Based Reasoning: Techniques for Enterprise Systems,” Morgan Kauffman Publishers, July 1997.