(MySQL AB) database with ... Apache Tomcat Web server, and the underlying MySQL databases .... advanced data analysis, such as the times when a BPR was.
Database Designs and Metadata Management for Climate Observations and Tsunami Sea-Level Monitoring and Quality Control Jing Zhou1, Landry Bernard2, 3, Richard Bouchard2, Kevin Kern2, Chung-Chu Teng2, Kirk Benson1, and Jack Higgs1 Science Applications International Corporation1, National Data Buoy Center2, University of Southern Mississippi3 Stennis Space Center, MS 39529 USA Abstract-Two major buoy networks have been added to the National Oceanic and Atmospheric Administration National Data Buoy Center (NDBC) as the result of the transition of Tropical Atmosphere Ocean (TAO) Array with 55 TAO legacy buoys and the global establishment of the second-generation Deep-ocean Assessment and Reporting of Tsunami (DART®) network with 39 surface buoys and collocated subsurface bottom pressure recorders (BPR). Accordingly, their underlying data management systems were redesigned and implemented at NDBC. This paper illustrates practical designs that support the metadata management for both of the buoy networks as well as their specific needs for data management and quality control. Metadata life cycles and operational workflows are discussed and future ideas of web presentation of observation data and metadata via Open Geospatial Consortium (OGC) Sensor Model Language (SensorML) are considered. I.
INTRODUCTION
The Tropical Atmosphere Ocean (TAO) array is National Oceanic and Atmospheric Administration’s (NOAA’s) premier climate observing system and is the crown jewel observing network for the El Niño-Southern Oscillation (ENSO) events. Much of the technology transitioned from the Pacific Marine Environmental Laboratory (PMEL) has been discontinued or is no longer supported by manufacturers. To ensure ongoing continuity of the TAO array, NDBC is updating the TAO buoy system [1]. Accordingly, a shore-side data management system for integrated metadata management, data processing, quality control, and data delivery was developed and implemented [2]. Similarly, NOAA’s Deep-ocean Assessment and Reporting of Tsunami (DART®) was transitioned from PMEL to NDBC in 2004, but NDBC used its existing data management system that provided a low-cost, expedient strategy to effect the transition and ensure reliable delivery of data to the Tsunami Warning Centers (TWCs) and support NDBC logistic efforts.
0-933957-38-1 ©2009 MTS
In the aftermath of the 2004 Indian Ocean tsunami, NOAA strengthened its tsunami warning capability by expanding the DART® network from six to 39 stations and upgraded all stations to the second generation DART® technology [3][4]. With the expansion of the network from six to 39 stations, detailed records of deployments could no longer be managed efficiently and to the degree required for research purposes. Using the data model developed for NDBC’s weather stations proved cumbersome for the DART® stations. Weather station data are organized according to the months and years of the calendar. Doing so with DART® deployments introduced discontinuities across deployments due to the variability of the seafloor bathymetry where the bottom pressure recorder (BPR) settled. While it was important to distinguish between deployments for DART®, it was imperative that TAO maintain continuity across deployments for climate studies. Ironically, these seeming contradictions found common ground between the two systems centered on deployments. Both systems also had a common requirement for post-deployment processing of data for archive purposes that NDBC’s weather stations did not. This called for a complete establishment of relationships between deployed sensors and the corresponding observation data and metadata to meet the scientific community’s needs. Therefore, the sensor deployment life cycle concept becomes critical to support both the TAO and DART® data management systems. Since the TAO data management system was successfully implemented based on the deployment life cycle concept, its core designs were extended to cover the specific needs for the DART® II buoy network. New databases were designed for both the TAO and DART® data management systems. Metadata collected during operations are entered into the new databases via web
interfaces by the NDBC Data Assembly Center (DAC) and calibration lab personnel. Detailed workflow procedures include property registration, equipment green tag, system integration, deployment, recovery or service, and data packaging after recovery. System validations are put in place to ensure the new databases maintain a high quality of metadata entries. Since the new database designs are sensororiented, in line with the concepts of Sensor Model Language (SensorML) specifications from Open Geospatial Consortium (OGC), this opens the door for NDBC to leverage the infrastructure to follow the NOAA standard data formats. The TAO and DART® metadata can be encoded using properly defined SensorML schema in the future.
performed and the results are integrated into the same daily status report for the legacy moorings. Quality controlled data are then pushed to the TAO web site for data delivery as well as to NWSTG for GTS dissemination. Iridium Satellite
NWSTG Silver Spring COMMS01
SSWEB01
NWS GTS G L B
NOAANet
Gateway Hawaii
World
TAO02
In the following sections, we will discuss the two networks’ data flows, metadata requirements, design considerations, as well as future development with an emphasis on the DART® II buoy network.
NDBC Stennis Space Center
II. DATA FLOWS OF BUOY NETWORKS
Figure 1. TAO Refresh System Data Flow
COMMS02
TAO03
A. Data Flows of TAO Array TAO sensors are mounted to a surface buoy or attached to the buoy’s subsurface conducting cable. The primary data transmitted from the TAO sensors in real-time are daily mean surface measurements (wind speed and direction, air temperature, relative humidity, and sea surface temperature) and subsurface temperatures. Optional enhanced measurements include precipitation, short and long wave radiation, and barometric pressure, salinity, and ocean currents. Argos receives the data via the NOAA Polar Operational Environmental Satellites (POES) and NDBC retrieves Argos raw data via Telnet access to Argos. Argos also receives sensor calibration coefficients and release controls from NDBC and places converted data on the Global Telecommunication System (GTS).
B. Data Flows of DART® Network
The legacy data are processed daily at NDBC immediately after receipt. The first step in the daily processing is the application of calibration information. Once the data are converted to engineering units, they are subjected to a series of automated quality control checks, which produces a daily status report to be viewed by NDBC DAC personnel. The released data are then pushed to the TAO web site for the public access.
Unlike a TAO system or an NDBC weather system, a DART® II tsunami system consists of a BPR and a surface buoy. Two electronic payloads, hosted in the buoy, each contain one acoustic modem and one IRIDIUM® (Iridium Satellite LLC) modem. The BPR resides on the ocean bottom and monitors water pressure using a Paroscientific pressure transducer with a resolution of approximately 1 millimeter. Samples integrated over a 15-second time window are recorded internally by the BPR and provide the base sampling interval for all real-time transmissions. Using its acoustic modem, the BPR transmits sea-level data to the surface buoy, which in turn, transmits the data via IRIDIUM satellites to ground systems at Tempe, Arizona. The IRIDIUM® ground systems deliver the data to the communication server at Silver Spring via a Router-based Unrestricted Digital Internetworking Connectivity Solution (RUDICS) connection. The software on the communication server that handles the connection and processes incoming data is referred to as the RUDICS server. A backup communication server is available at the NDBC Stennis Space Center facility in order to receive BPR sea-level data when the communication server at Silver Spring is not available [5].
The TAO refreshed buoys transmit data to the IRIDIUM® gateway via IRIDIUM® satellites (Iridium Satellite LLC) as shown in Figure 1. High temporal resolution (10-minute and hourly) measurements are available in the hourly transmitted short burst data (SBD). The IRIDIUM gateway delivers the SBD to an NDBC communication server hosted at the National Weather Service Telecommunications Gateway (NWSTG) at Silver Spring, Maryland. A decoder is used to decode the SBD and store them in the underlying databases. Hourly or high-resolution data quality control checks are
The RUDICS server processes the incoming messages by adding World Meteorology Organization (WMO) routing identifiers and message timestamps to the raw message headers and then sends them via File Transfer Protocol (FTP) to the NWSTG for GTS dissemination to the TWCs and international partners. The processed messages are also sent via FTP to the Chilean Oceanographic Agency and Indonesian Agency for Assessment and Application of Technology. The RUDICS server also decodes the raw messages and stores both the raw messages as well as decoded sea-level data into
the underlying MySQL® (MySQL AB) database with corresponding IRIDIUM transmission identification (ID) and BPR transmission mode. The standard mode data are transmitted every six hours containing 24 15-second sea-level observations at 15-minute intervals. A significant capability of the DART® II technology is the two-way communications between the RUDICS server and all BPRs deployed on the sea bottom. The user interface for the two-way communications is accessible via a secured Apache Tomcat Web server on the communication server, which is referred to as the DART® Data Management Console, or simply as console, see Figure 2 below. The console allows NDBC DAC and TWC personnel to trigger the BPRs into event mode in anticipation of possible tsunamis or retrieve the high-resolution 15-second sea-level observations in one-hour blocks for detailed analysis. The RUDICS server, the secured Apache Tomcat Web server, and the underlying MySQL databases are the key building blocks of the console, on which the DART® metadata management rely. The metadata requirements with key design consideration for the console are discussed in the following sections. Buoy Payloads
Iridium Ground System
NWSTG GTS
However, when NDBC started collecting ocean subsurface observations, such as the water temperature and salinity at certain depths from a TAO buoy, the existing database model could not satisfy the basic requirements: the need to bind a sensor’s observation data to the sensor’s calibration coefficients and the need to know when the sensor was put into the sea and when it was pulled out of the water. The needs became clearer when a DART® buoy and its BPR with a Paroscientific transducer were deployed and later recovered. For retrospective data analysis purposes, scientists need to know when the Paroscientific transducer was deployed and the corresponding calibration coefficients at the time of deployment in order to recalculate the sea-level data based on the BPR recorded raw data counts. Also the deployment temporal boundaries can not be ignored for DART® II systems. This is because every time a BPR is dropped onto the sea floor the water depth may differ significantly from the previous one even if the dropped location is very close to the previous location. So these basic subsurface requirements call for new database designs to fully support integrated observation data and metadata management to meet the scientific community’s needs. The TAO databases were originally designed to support the subsurface observation requirements. It is natural to extend the database designs to support DART® sea-level observation data and metadata management.
Database and File Management
Console Web Interfaces
B. Key Design Consideration Public Users
Automated QC and Alerts
Public Web Interfaces
Scientists
Delayed Mode Analysis Tool
RUDICS Server and Dissemination
NDBC Data Assembly Center and Tsunami Warning Centers
®
Figure 2. DART Data Management Console
III. DESIGN CONSIDERATION AND METADATA MANAGEMENT A. Requirements for Subsurface Observations The existing NDBC database was designed previously to support logistics with simple extensions to store limited metadata and sea-surface observation data. For a typical weather station, there were no strong demands to bind observation data to the corresponding surface sensors and also there were no demands to keep the time windows when the surface sensors are deployed. This is because the sea surface observation data can be aggregated over a long period as long as they are collected at roughly the same location and the temporal boundaries of sensor deployments can be ignored. The existing database model works fine with the NDBC weather stations where the emphasis is on logistics.
The first core concept of metadata management that the new database designs have to support is the concept of a deployment. A deployment describes the effective time window of a set of sensors deployed at a particular location. As a practical rule, if any of the sensors from a deployment are recovered or new sensors are added to a buoy system by service technicians, the deployment being serviced is marked as recovered and a new one is created to reflect the afterservice status. So for a typical DART® II system deployment, when the BPR along with its Paroscientific transducer is dropped into the sea, a new deployment starts and when the BPR is retrieved out of the water, the deployment ends. The definition of deployment start times and end times differ significantly from the existing logistics system designs in which the operation periods of surface buoys are the center of concerns. For a DART® II buoy system, the surface buoy merely serves as a communication relay between the BPR on the sea floor and the IRIDIUM satellites. Of interest to scientists are the operational periods of the BPR and its Paroscientific transducer (model 410K-101) and related metadata, such as the BPR dropped locations, transducer serial numbers and the 14 Paroscientific calibration coefficients used to re-construct the sea-level values from the recovered raw data counts. Therefore, a deployment defines the life cycle of sensors and related metadata. A DART® deployment, specified by a unique deployment name with a start and a possible end date, has one deployed
sensor – the Paroscientific transducer and several other deployed equipments, such as BPR enclosure. A Paroscientific transducer may have multiple calibration data sets if it is calibrated multiple times by its manufacturer. But only one of the calibration data sets is associated with a particular deployment. The BPR enclosure also maintains the oscillator period settings, BPR clock settings, etc. The calibration data set and oscillator period are evaluated and tested at the NDBC green tag times. The oscillator period may also be re-evaluated and maintained in the console before the actual deployment and after the deployment is recovered, if operation conditions allow. Those data with the BPR location are of interest to scientific data users. Other data, such as surface buoy locations and drift status, IRIDIUM IDs and transmission status, and GTS and Web data release control are of interest to the NDBC DAC operations. See the deployment data structure in Figure 3 below.
DART Deployment Deployment Name and Deployment Start time and End time Primary and Secondary Iridium IDs BPR Latitude/Longitude and Buoy Mooring Latitude/Longitude Buoy Drift Status, Communication Transmission Status, and Recovery/Service Status GTS and Web Data Release Control
Deployed Buoy
Deployed Sensor Serial No. Operating Start Operating End Surveyed Water Depth
Buoy Hull/System Revision Iridium Phone Numbers Operating Start and End Service Logs
Paros Calibration
Deployed Equipment Equipment Serial No. Payload Associated Operating Start Operating End
BPR Enclosure Settings
Paroscientific Calibration Coefficients
Oscillator Period and Clock Differences
®
Figure 3. A DART Deployment Data Structure
The DART® deployment must be assigned to a site. The site as the second independent core concept describes a location where a WMO ID may be assigned. It may also describe when it is established and when it is decommissioned, as well as its usage, owner, and history. There are 39 production sites owned and maintained by NDBC DART® operations and several international sites for international DART® partners. See the global DART® array map generated by the console in Figure 4 below.
®
Figure 4. Global DART Array
The green dots are the sites that have transmitted sea-level data within the last six hours. The orange dots are the sites that have not transmitted sea-level data for more than 12 hours. The red dots are the sites that have been identified as transmission outage by data quality analysts at the NDBC DAC. When a BPR enters into the event mode via a trigger, 16 15second sea-level values around the trigger time and oneminute averaged sea-level values from nine minutes before the trigger time to three hours and two minutes after the trigger time will be transmitted in the event mode. In addition, oneminute averaged sea-level values in a two-hour block may also be transmitted on an hourly basis until the BPR terminates the extended event mode. The waving site in the figure indicates a trigger incident within the last 24 hours. In addition, the earthquake information can be displayed in the global DART® array if the console receives the warning information from TWCs. When the NDBC DAC data quality analysts or TWC personnel want to manually trigger BPRs into event mode, a RUDICS command page can be used to select multiple DART® stations to be triggered. See Figure 5 below, which includes NDBC test stands.
Figure 5. RUDICS Command
For a typical NDBC operational site, there is one active deployment and many recovered deployments. When a user clicks a dot in the DART® array, the active deployment metadata, if available, will be displayed along with other data display and delivery interfaces. Otherwise, the most recent recovered deployment will be displayed. Sites are partitioned into six regions: Indian, West Pacific, Southwest Pacific, East Pacific, West Coast/Alaska, and Atlantic. Data quality control algorithm settings can be defined on the global DART® array or regions or individual sites. Typical algorithms are range limit checks. The maximum allowed deviation of sea-level values from the surveyed water depths where the corresponding BPRs are deployed is set on the global DART® array. If a sea-level value is outside the allowed deviation, a bad quality flag and a no release flag are assigned to the value. There are duplicated sea-level values of the same time tags as they are transmitted in different BPR modes from one buoy payload. In Figure 6 below, the green dots are the sea-level values transmitted by the BPR in the standard mode from station 51425 around March 19, 2009. The pink dots are the 15-second sea-level values transmitted in the event mode. The yellow dots are the one-minute averaged sea-level values transmitted in the event mode. The purple dots are the oneminute averaged sea-level values transmitted in the extended event mode. At a given time, if there is no sea-level value released at any earlier checks, the one to be released to the web is the first sea-level value that passes all quality checks in the order of the event mode 15-second data, the event mode one-minute averaged data, the extended event mode one-minute averaged data, and the standard mode 15-second data, regardless the raw messages come from the primary payload or the secondary payload. All raw BPR messages from a deployment are released to the world over GTS in real-time if the corresponding GTS release control is enabled by the NDBC DAC data quality analysts.
be sent out to the NDBC DAC and TWCs when a trigger incident is recorded. The third core concept is the concept of a sensor or equipment. A sensor or equipment has a serial number, a property number, equipment type, vendor, etc. For some of subsurface sensors, there may be some calibration coefficients associated. A sensor may also have multiple measurement types and each of measurement types are specified by its sensor specifications, where the measurement units, sensor sampling frequency, and averaging methods may be recorded. A bridge program is responsible for synchronizing the equipments between the logistic systems and the console MySQL databases. In addition, the logs of the two-way communications are maintained in the MySQL databases for advanced data analysis, such as the times when a BPR was reset. There are many other important concepts that metadata management needs to address, but they are skipped here for simplicity. C. Workflow Management Based upon the above core concepts, the MySQL databases were designed to support the DART® Data Management Console. Metadata collected during operations are entered into the new databases via the console by the NDBC DAC and lab personnel with a workflow procedure as follows. Property registration: Register new equipments and sensors as NDBC properties in the logistics system with their serial numbers, NDBC property numbers, models, etc. Green tag: Synchronize all properties from the logistics system to the new databases using the bridge program and record Paroscientific calibration coefficients and BPR reference oscillator periods in the new databases as they pass NDBC green tag testing. Integration: Configure station entries for new DART® II buoy systems in the logistics system and install properties on the corresponding stations. Perform NDBC blue tag integration testing. Deployment: Import installed properties for a given station from the logistics system into the new databases and create a new deployment and assign it to a site. If possible, record predeployment BPR clocks and reference oscillator periods. Record BPR dropped locations, surveyed water depths, and surface buoy mooring locations, etc.
Figure 6. Station 51425 Sea-level
In the underlying MySQL databases, each of the decoded sealevel values is associated with the corresponding BPR mode, buoy payload, quality flag, release flag, and original raw message. A trigger incident table is used to record events that happened to BPRs with trigger time. Email notifications will
Recovery: Recover deployments with the times when their BPRs were retrieved out of the water. If possible, record postdeployment BPR clocks and reference oscillator periods. Record lab-checked reference oscillator periods for recovered BPRs later. Register 15-second raw data on the recovered BPRs in the new databases and inspect and package the data
along with deployment metadata and send results to NOAA National Geophysical Data Center. Remark: During a deployment, there might be multiple services to the deployment to replace some equipment as needed. Certain replaced and recovered equipment are sent back to NDBC and subject to the NDBC green tag and blue tag testing for reuse. For important metadata data entry activities, such as deployment and recovery management, the console performs certain data entry validation and informs users if any anomalies are found. This ensures the new databases maintain a high quality of metadata entries. IV. CONCLUDING REMARKS
operates by combining real-time tsunami data obtained from the DART® network and a set of nested numerical models to provide an estimate of tsunami arrival time and amplitudes. The SIFT needs configuration metadata of the entire DART® network at any time in order to work properly. Working examples are being developed between NDBC and PMEL/TWCs to support the SIFT with real-time metadata notification and exchange. REFERENCES [1] Chung-chu Teng, Landry Bernard, Lex LeBlanc, Bill Hansen, Richard Crout: Test and Evaluation of Refreshed Tropical Atmosphere Ocean (TAO) Buoy System. OCEANS 2008 – MTS/IEEE Kobe, April 2008. [2]
Bouchard, R., K. Kern, L. Bernard, C. C. Teng, R. Crout, D. T. Conlee, S. Birch, J. Zhou, R. Gagne, J. Boyd, T. Mettlach, R. Weir, J. Rauch, D. C. Petraitis, P. Spence, M. Follette, D. McCaffrey, M. Little, and B. Comstock: Operational transition of the data processing, quality control, and web services of the Tropical Atmosphere Ocean Array (TAO), Proc. 14th Symposium on Meteorological Observation and Instrumentation, American Meteorological Society, 2007
[3]
Bouchard, R, S. McArthur, W. Hansen, K.J. Kern, and L. Locke: Operational Performance of the Second Generation Deep-ocean Assessment and Reporting of Tsunamis (DART™ II). OCEANS 2007 – MTS/IEEE Vancouver, October 2007.
®
In order to support TAO and DART data management, new databases were designed to capture scientific community required metadata for ocean subsurface observations. Metadata life cycles were redefined by the new deployment concept at NDBC. Since the new database designs are sensororiented, in line with the concepts of Sensor Model Language and Sensor Observation Service (SOS) specifications from the Open Geospatial Consortium, NDBC is able to leverage the infrastructure to provide advanced web services to the public. The applications being developed at NDBC include NOAA Standard Data Format via SOS and SensorML, which will provide standardized sea-level data and metadata delivery mechanisms to wider user communities. In particular, PMEL has developed a Short-term Inundation Forecasting for Tsunamis (SIFT) tool for TWCs over the years. The tool
[4] Douglas Maxwell, Shannon McArthur, William Hansen, Richard Bouchard, Ian Sears, Jack Higgs and Mark Webster: U.S. Deep-Sea Tsunameter Network Fully Operational. OCEANS 2008 – MTS/IEEE Quebec, September 2008. [5]
Jing Zhou, Jack Higgs, David McCaffrey, and Raymond Locatto: DART® Data Management System Detailed IT Architecture. National Data Buoy Center Technical Document, February 2009.