Identity and Functionality in the Common Instrument Middleware Architecture Donald McMullen and Thomas Reichherzer Pervasive Technology Labs at Indiana University 501 N. Morton St., Suite 224 Bloomington,Indiana 47404 USA E-mail:
[email protected],
[email protected]
Abstract. The Common Instrument Middleware Architecture (CIMA) is middleware for integrating scientific instruments, individual sensors or actuators, and sensor networks into computing and storage grids. A key element of the design of CIMA is the representation of the identity and functionality of network accessible instruments and sensors. This paper presents the approach used in CIMA to represent instruments and sensors and how these representations are used by applications to build functional models of CIMA-enabled instruments and sensors. One potential outcome of this work is the development of a “Semantic Web” for instruments, allowing users from all segments of society to find and use instruments, sensors and other real-time data sources that can answer critical questions about the real world.
1
Introduction
Scientific researchers by and large consider experimental or observed data to be the raw material from which knowledge is extracted through a process that might involve reduction, fusion, and analysis. This approach on data per se focuses on the latter part of the chain in the reduction and analysis phases that are performed “off-line” with respect to the instrument that collected the data and ignores possible interactions between the analyst and the instrument itself. Large-scale projects organized around instruments such as detectors at the Large Hadron Collider or aggregations of weather data for severe weather prediction have yielded useful application test beds, but ones that focus on data as separate from the instrument that produced them and at a time after their acquisition. For many major instrument facilities that provide advanced, complex or even experimental observational hardware, researchers with the expertise to judge the quality of appropriateness of the data may not have access to the proper tools to perform this analysis until well after the data has been collected, and not at critical initial stages of the experiment when adjustments or corrections to the hardware can make a difference in the outcome. This focus on data as separate from its source and the operations necessary to control instruments remotely in real-time in order to generate, acquire and filter data, combined with improved prospects for making instruments and sensors network-accessible has left a gap in capabilities for e-Research. A potential solution to this gap and enabler for on-line real-time access to instruments and sensors is a standard methodology for specifying and developing interfaces that existing and future instruments can provide and that data acquisition and reduction applications can rely on. To this end we have been developing the Common Instrument Middleware Architecture (CIMA) [Devadithya et al., 2005], a Web Services based approach to making instruments
and sensors network accessible. CIMA provides a standards-based, uniform way to interact remotely with instruments and the data they produce. CIMA addresses the following issues in remote access: (1) standardization of the network protocol for interacting with instruments, sensors and actuators, (2) flexibility in the underlying network transport, (3) efficient and high throughput data transport, (4) availability of computational, storage and networking resources in the instrument or sensor controller, (5) graceful evolution of instrument design, and (6) reuse of data acquisition and processing codes. A solution for software dependence on the details of hardware design is a middleware layer to abstract control of instruments. A common middleware layer makes it possible for instrument users to build, test and deploy new software to implement their experiments as needed, and to continue to use existing software without modification when instruments are redesigned or upgraded. An instrument middleware layer will also allow instrument developers to open up their products to take advantage of a broader range of control applications, and would promote the fusion of data from multiple sources in real-time. A vision articulated by [Berners-Lee et al., 2001] for the Semantic Web includes the notion of explicitly integrating information, services and hardware through descriptive ontologies to the point that software agents and people could effectively find and organize ”real world” resources to improve quality of life. The intent in the CIMA program is to explore these ideas within the domain of sensors and instruments for exploring and measuring the physical world. 2
Overview of CIMA and the role of the instrument description
CIMA provides a network-addressable wrapper service around existing instruments and sensors that adds the following components to the base functions provided by an instrument and its controller: network interface (IP or other interconnect), one or more Channel services, and one or more Plug-ins that map channel requests to hardware functions. Channel services consist of a Web Services [Champion et al., 2002] endpoint for each logical grouping of instrument functions and implement the Parcel protocol. Parcels are XML documents or fragments that identify what operations a client wants to perform on the Channel, and when sent to a client from a CIMA service, contain data or result codes from sensors or actuators subsumed by the Channel. Parcel protocol operations include get to get a sensor reading, set to set an actuator or parameter value, describe to return a description of the instrument, register for a client to register a Web Services endpoint to receive streaming data. Additional operations support user authentication and authorization at the CIMA service. Plug-ins map logical names for sensors and actuators, provided by the Channel’s describe function, to functions that read and write the actual devices. Output from a plug-in may include the sensor value plus additional metadata such as a time stamp or error code. Although the default communications protocol and transport are document/literal encoded SOAP1 over Secure Hypertext Transport Protocol (HTTP/S) over Transmission Control Protocol (TCP), etc., all levels are decoupled and essentially any transport can be used. Alternative transports we have implemented include Java Messaging Service over TCP and Antelope [Vernon, 2002, ant, 1998], a system for carrying data from seismic ground motion sensors. Figure 1 below illustrates the main components of CIMA and their relationship to the underlying instrument. In Figure 1 a user application interacts with the CIMA Channel service through the following sequence of steps: 1) user application registers to receive streaming data from a given sensor; 2) the Channel service calls the appropriate plug-in to set up the data stream; 3) the plug-in repeatedly returns the sensor’s data according to the specification in the request; 4) SOAP calls are 1
See “Simple Object Access Protocol (SOAP) 1.1,” W3C Note 08 May 2000. http://www.w3.org/TR/SOAP/
Figure 1: CIMA operational overview.
made by the Channel to an endpoint given by the user application. All messages sent and received by the Channel are instances of the Parcel schema. In addition to streaming data, the application can make requests to return or get one sensor reading or to issue an actuator set command (steps 4 and 5 in Figure 1). Other “immediate” commands include retrieving a description of the instrument’s sensors and actuators as OWL-DL instances (describe), and authorization of the user application (session). The details of the operation of a plug-in can be discovered by the user application through the instrument description available through the Channel protocol’s describe operation. The instrument description can also be used by applications to help parse data messages from, and build control messages to CIMA-enabled instruments and sensors. This self description functionality makes it possible for applications to interoperate with a CIMA-enabled instrument at a semantic level, preserving investments in code across upgrades in a specific instrument, and making applications work across a range of instruments with similar functionality but from different vendors. For example, an application could use an instrument description to build a graphical user interface that connects GUI widgets to sensors and actuators through a Channel wrapping an instrument, to allow users to control the instrument remotely in realtime. One important difference between the CIMA approach and other, related projects such as the W3C device description language CC/PP2 is that the description in CIMA is intended to be embedded in the device, not referenced through a device “key” and retrieved from an external vendor-provided web site. A device can modify the CIMA embedded description as needed to reflect state in real time for that particular instance of hardware. This opens up the possibility of exposing device class, device instance and temporal state information about an individual device that is not possible with external, device-class type descriptions. Also, if this information is embedded in the device itself the process of discovery can be made independent of directory services. A key design objective of CIMA is to make the instruments and sensors self-describing that will enable both systems and users to obtain general information about available instru2
See “Composite Capabilities/Preferences Profile Public Home Page”, http://www.w3.org/Mobile/CCPP/
Figure 2: Instrument descriptive model and correspondence between sensors and actuators and plug-ins in a CIMA service.
ments and sensors in CIMA, what data the instruments and sensors provide, and how they can be accessed and configured. This allows components downstream in the data acquisition and reduction process to understand and manage the instruments and sensors more effectively by, for example, applying appropriate conversions and calibrations to the data. The annotation of instrument data provides information needed for human interpretation, organization and curation of the data. The development of these components is predicated on a “CIMA ontology” for instruments and sensors, the current implementation is encoded in OWL-DL (Web Ontology Language - Description Logic), a subset of OWL [McGuinness and Harmelen, 2004]. OWL-DL was chosen for several reasons: it makes the description amenable to machine reasoning tasks, it facilitates distributed development and extension of the CIMA ontology and, through inferencing, makes it possible to check the consistency of the ontology even across multiple developers and sites. In addition, XML Schema products such as SensorML3 and ISO schema for location (ISO-19107) and time (ISO-19108) can be leveraged as XML Schema data types from within the RDF specification of the ontology, i.e., instances of the CIMA ontology can refer to resources that are XML documents based on these Schemata or can use types from XML Schemata to type RDF resources. The CIMA descriptive instrument model consists of several levels as illustrated above in Figure 2. At the outer level is the observatory, the location of one or more instruments with related functionality. An example of an observatory is a crystallography bay containing a goniostat, a CCD array, and several temperature probes used to collect data for an X-ray diffraction crystallography experiment [Bramley et al., 2006]. An observatory has one or more instruments, each consisting of several sensors which provide observations of measurable quantities and actuators which control the instrument. In the example, the individual thermocouples and hygrometers are sensors. The goniostat positioning system is an example of an actuator. The CIMA ontology in OWL-DL consists of classes to model the physical components such as observatories, instruments, sensors and actuators as well as the observable products of sensors and the service characteristics of the instruments. The next section will describe 3
See http://vast.nsstc.uah.edu/SensorML/
in details the structure of the ontology to capture CIMA components and their attributes and responses. 3
The structure of the CIMA ontology
The purpose of the CIMA ontology is to provide an extensible and standardized vocabulary for describing hardware resources linked to the World-Wide Web (or at least network addressable using common protocols). The descriptions may be useful for machines as well as people to find out about such hardware resources and the data they generate, and to operate and control them. This vision builds upon the Semantic Web [Berners-Lee et al., 2001] whose main goal is to make Web content meaningful not only to people but also to machines by providing necessary Web languages and software tools to tag Web pages with metadata that defines the meaning of Web content. An ontology that gives meaning to Web content enables existing services such as search to improve its methods to find and compare information on the Web. Similarly, an ontology that gives meaning of hardware resources accessible through standard Web protocols will enable existing services to improve discovery, sharing, access and control of such hardware resources. For example, an OWL-DL description of the instruments available in CIMA will allow users to search more effectively for available instruments and to compare their capabilities and controls. Work to date has focused on providing a set of extensible classes and instances for describing CIMA instruments using OWL-DL. However, the long-term goal is to build a generic ontology capable of describing any sensor and actuator hardware resources on the Web. Consistent with CIMA’s physical and logical model of instruments and data collection mechanisms previously discussed, the ontology includes classes that describe (1) an instrument an its properties such as the physical and logical location, (2) the kinds of sensors and actuators to collect data and to control instruments, (3) the physical phenomena that can be detected with sensors, (4) the products of sensors and the models that describes their responses, and (5) the communication with CIMA instruments to trigger data collection and to configure instruments. Based on conventions by the International System of Units (SI), the CIMA ontology differentiates between fundamental and derived physical phenomena and their corresponding units to measure them. The conversion between equivalent units is described in algebraic expressions allowing a system that uses a CIMA instrument description to automatically convert units as needed. For example, the ontology describes time as a fundamental phenomenon measured in seconds as well as provides algebraic expressions to convert time from seconds into derived units such as minutes, hours, days, and years. A sensor may provide different data sets corresponding to the different phenomena it detects. Thus, the ontology describes the products of sensors as components, modeling the specific phenomena that a sensor may detect, the value it delivers as well as the physical model by which it response and behaves in the environment. For example, an instrument may be equipped with a a sensor that measures humidity and temperature. In this case, the sensor’s product must be modeled with two components, detecting either humidity or temperature, each responding according to a specific physical model, and each delivering a specific type of value such as an integer or float value in a particular range. Finally, the ontology associates a component with a communication handle, to describe the communication protocol for obtaining sensor data with respect to the measured phenomenon. An instrument may be equipped with several actuators that allows user to influence how it interacts with its environment, to measure different phenomena and collect data. The CIMA ontology models actuators as controls that have a particular control value type and range of
values that can be changed. Analogous to a product’s components, the ontology models a control’s response model that predicts how an actuator behaves and associates a control with a communication handle for obtaining the status and modifying an actuator. To facilitate search of instruments and to monitor changes in instrument locations and positioning, the CIMA ontology describes the physical and logical location of instruments. To describe physical locations, the ontology builds upon existing vocabulary based on the Geography Markup Language to describe geographical positioning information. Additionally, the ontology provides concepts such as observatory to specify the organizational unit of instruments in CIMA. Figure 3 below presents a screenshot depicting a visualization of a subset of the properties and classes in the CIMA ontology in the form of a concept map [Novak and Gowin, 1984, Hayes et al., 2005]. The depicted classes describe a specific diffractometer that has a Goniometer for controling the instrument and a detector for obtaining X-ray diffraction pattern.
Figure 3: Subset of the CIMA ontology properties and classes.
The CIMA ontology uses a variety of OWL-DL class axioms and property restrictions to ensure that the ontology (1) models accurately CIMA hardware, its data products and communication mechanisms and (2) supports consistency checking using existing reasoning engines. The consistency checks will guarantee that descriptions of instruments and sensors will be encoded in accordance with the guidelines and rules stipulated by the CIMA ontology
and that extensions of the original CIMA ontology will not violate any of the carefully designed constraints between the original classes and properties. The benefit of such automatic consistency checks is that it simplifies management of instrument and sensor descriptions, which may come from different authors at different places in the CIMA community, ensuring that all descriptions are uniformly coherent. This sets conventional descriptions encoded in XML Schema apart from the approach we pursue in the CIMA project. 4
Building instrument descriptions: A CIMA-specific case study
A key application for validating the approaches taken in CIMA is X-ray diffraction crystallography. In the diffraction crystallography experiment a crystal is suspended in an X-ray beam and the diffraction patterns caused by the regular structure of the crystal are captured by a CCD detector. The crystal is rotated using a goniostat and CCD images are collected at each set of rotation angles. In addition to the primary data of the CCD images a number of other data are collected to assure quality of the images and to allow problems with the experiment to be diagnosed. These additional parameters include crystal temperature, CCD temperature, CCD cooling water temperature, air temperature and humidity, and liquid Nitrogen tank level. At synchrotron light sources there are additional variables related to the beam current and X-ray optics that are also relevant. Figure 4 shows a typical diffractometer consisting of a goniostat , X-ray source, CCD detector and cryostat. The crystal under study is mounted in the X-ray beam in front of the detector and is cooled by liquid Nitrogen from the cryostat.
Figure 4: A typical laboratory X-ray diffractometer.
An instrument description is based on a general framework (see Figure 2) in which an observatory (research facility) contains instruments; each instrument contains sensors or actuators; and each sensor and actuator has a detailed description of what observable it provides or function it performs, respectively. Each hardware element has a corresponding plug-in or
driver that the service instantiates and through which users communicate with the instrument’s hardware. In addition to observables and response models for sensors and actuators, the description must provide information about how to interact with the instrument to acquire specific sensor readings and to control actuators. This information is site and installation specific, and depends on the software configuration of the CIMA service that controls the instrument. When a service is set up, network addressing information in the form of a URL for the CIMA service is provided along with internal names used to identify plug-ins for individual sensors and actuators. This level of customization is done at run-time through a configuration file for the service that provides the plug-ins to use for this instrument, local names for each plug-in’s sensors and actuators, and any hardware configuration and start-up parameters. Network and local addressing information as well as initialization parameters can be added at run time to the ontological description by the plug-in or given beforehand by editing the description directly. An instrument description may be built entirely by a plug-in designer using an existing description for the same or similar model, or it may be constructed from scratch using an instance editor we have developed. As mentioned above, some run-time customization can be provided as a function of the plug-ins to give instance addressing and customization information. Functionality for an instrument is divided into three categories: what observables it senses and reports, along with units and response models for sensors; what electronic or mechanical controls are available to the user and how control parameters map to physical or logical changes in the instrument; and how to access sensor data or interact with controls, including addressing and address namespace specifications. Since all interaction takes place through the CIMA service and plug-ins, description information for accessing and controlling the instrument are determined in some part by the semantics of how a given plug-in functions. 5
Applications of the CIMA Ontology
CIMA is a joint project involving seven research labs from the US, the UK, and Australia to provide improved access to X-ray crystallography instruments, telescopes, and sensor networks and to share data. Our goal is to provide a complete description of the equipment of the research labs that have joined CIMA using the previously discussed CIMA ontology. The OWL-encoded description will then be used by standalone applications that automatically query available instruments and sensors in the network to create graphical user interfaces that provide access and control of lab equipment. Figure 5 shows a screenshot of a prototype system that is under development for the CIMA project. The system uses a set of CIMA instrument descriptions to build an interface that allows users to connect to and instrument, to access general information, data and to control the instrument. The screenshot depicts on the left side a set of instrument properties and on the right side the values of the properties that includes actual sensor data and GUI components for controlling the instrument. Sensors and control of actuators are grouped simplifying quick inspection of the available sensors and actuators of an instrument. The right side depicts the three angle values of the single crystal diffractometer and sliders to modify them. The CIMA project will provide a similarly functioning application to enter description of a new instrument and to either include the description in a CIMA service or to upload the instrument description to a central registry where systems can search for specific instruments and sensors. Preliminary experiments in which available instruments and their characteristics were automatically identified using the Resource Description Query Language [Seaborne, 2004] (RDQL) applied to our CIMA ontology have been successful. For example, the ontology supports construction of queries that identify available instruments in a particular region or
Figure 5: A screenshot of the CIMA Instrument Access (CIA) application.
measure a particular phenomenon. Such information along with details of network addressing can then be used by applications to access to CIMA instruments and sensor to perform a variety of tasks automatically such as discovering newly installed equipment, analyzing its functionality and automatically integrating access to such equipment into a user’s application (e.g. extending or customizing a GUI). Another application we envision as part of the CIMA project is to provide customized access to instruments and sensors to improve flexibility of what data and controls can be seen. For example, consider a remotely accessible telescope and two types of users: professional astronomers and High School students. The former group’s characteristics might be used to select and make available a broad range of controls for controlling dome status, telescope orientation and optics and detector selection. The latter group might be interested only in selecting specific celestial objects for viewing, and in making minor adjustments in focus. The CIMA approach enables using a description of device capabilities together with user profile information to generate a user-appropriate interface. Future work will include the development and evaluation of instances based on the CIMA ontology and extension of the ontology based on these use cases. Two areas of particular interest will be explored: using self describing instruments and appropriate directory technology to improve “situation awareness” about the availability of (possibly new) network accessible sensors and instruments, and use of the ontology to support automatic data fusion. By way of a simple example that combines both situation awareness and data fusion consider a public flood warning network based on data streams from government and privately installed river height gauges that are self describing using the CIMA approach. As new gauges are installed, the flood warning network is able to discover them and include them in the overall flood monitoring and warning system with minimal human intervention and without prior knowledge. 6
Conclusion
The ontology under development for the Common Instrument Middleware Architecture project forms a basis for describing instruments and sensors sufficient to find instances of hardware with specific capabilities and to use an instance, ultimately aimed at enabling the development of a Semantic Web for instruments and sensors. Components of the ontology provide
users (people or software agents) with an awareness of available capabilities, information about the relative and absolute position of the device in space, information to build an operational model of a device including a description of the observables that the device can sense (sensors) and physical or logical states that can be modified externally (actuators), and a computational model that can be used to describe unit conversions, calibrations and other computational tasks related to the device’s operation. Finally, because of the characteristics of OWL-DL discussed above, this project can be an open effort with extensions and refinements of the ontology being carried out simultaneously by many independent groups. Acknowledgements The Common Instrument Middleware Architecture project is supported by National Science Foundation cooperative agreements and grants SCI 0330568 and MRI CDA-0116050, respectively. References [ant, 1998] (1998). Antelope arts configuration and opertions manual: Documentation for antelope environmental monitoring software, software release 4.1. Technical report, Boulder Real-Time Technologies (BRTT). [Berners-Lee et al., 2001] Berners-Lee, T., Hendler, J., and Lassila, O. (2001). The semantic web. Scientific American, 284(5):34–43. [Bramley et al., 2006] Bramley, R., Chiu, K., Devadithya, T., Gupta, N., Hart, C., Huffman, J., Huffman, K., Ma, Y., and McMullen, D. (2006). Instrument monitoring, data sharing and archiving using common instrument middleware architecture (cima). Journal of Chemical Information and Modeling, 46(3):1017–25. [Champion et al., 2002] Champion, M., Ferris, C., Newcomer, E., and Orchard, D. (2002). Web services architecture. World-Wide Web Consortium. [Devadithya et al., 2005] Devadithya, T., Chiu, K., Huffman, K., and McMullen, D. (2005). The common instrument middleware architecture: Overview of goals and implementation. In Proceedings of the First IEEE Internation Conference on e-Science and Grid Computing (e-Science 2005), pages 99–106, Melbourne, Australia. [Hayes et al., 2005] Hayes, P., Eskridge, C., T., Saavedra, R., Reichherzer, T., Mehrotra, M., and Bobrovnikoff, D. (2005). Collaborative knowledge capture in ontologies. In Proceedings of the Third International Conference on Knowledge Capture (K-Cap05), pages 99–106, Banff, Canada. [McGuinness and Harmelen, 2004] McGuinness, D. and Harmelen, F. (2004). Owl web ontology language overview, w3c recommendation. World-Wide Web Consortium. [Novak and Gowin, 1984] Novak, J. and Gowin, D. (1984). Learning How to Learn. Cambridge University Press, New York, NY. [Seaborne, 2004] Seaborne, A. (2004). Rdql: A query language for rdf, w3c member submission. World Wide Web Consortium. [Vernon, 2002] Vernon, F. (2002). Overview of the usarray real-time systems. EOS Trans AGU, 81, S318.