Connecting R to the Sensor Web - Semantic Scholar

34 downloads 0 Views 631KB Size Report
the Sensor Web Enablement Initiative (SWE) (Botts et al., 2008). The ob- servations can be provided in real-time or as archived data sets. They can be subsetted ...
Connecting R to the Sensor Web

DRAFT Daniel Nüst, Christoph Stasch, Edzer Pebesma Institute for Geoinformatics, University of Muenster Weseler Str. 253, 48151 Münster {daniel.nuest, staschc, edzer.pebesma}@uni-muenster.de

Abstract Interoperable data exchange and reproducibility are increasingly important for modern scientific research. This paper shows how three open source projects work together to realize this: (i) the R project, providing the lingua franca for statistical analysis, (ii) the Open Geospatial Consortium's Sensor Observation Service (SOS), a standardized data warehouse service for storing and retrieving sensor measurements, and (iii) sos4R, a new project that connects the former two. We show how sos4R can bridge the gap between two communities in science: spatial statistical analysis and visualization on one side, and the Sensor Web community on the other. sos4R enables R users to integrate (near real-time) sensor observations directly into R. Finally, we evaluate the functionality of sos4R. The software encapsulates the service's complexity with typical R function calls in a common analysis workflow, but still gives users full flexibility to handle interoperability issues. We conclude that it is able to close the gap between R and the sensor web.

1 Introduction As the whole process of environmental analysis is moving to the internet (creating a model web (Geller, 2008)) the data sources also are, which re-

sults in a growing amount of spatially distributed sensor data publicly available through standardized web service interfaces. Yet, client applications to access, to analyze and to visualize this data lag behind compared to spatial data in a raster or vector format. This is partly due to a gap between the research fields of sensor web and data analysis or geostatistics. The former group devotes itself to the development and implementation of standards for sensor data storage and exchange in the realm of Open Geospatial Consortium (OGC) or International Organization for Standardization (ISO) – a rather technical undertaking often based in computer science. The latter group, often domain specialists, actually analyzes the data to infer about the processes that generated it. This organizational and personal difference (in the sense that few people work in both research areas) results in a separate development of tools and knowledge about servicebased data retrieval and data processing. We try to narrow this gap by providing a tool that can benefit both communities and further facilitate interdisciplinary collaboration and research. A motivation for this work is to increase acceptance of the Sensor Web ideas and tools, and allow more data analysts and modellers to benefit from the Sensor Web. The general driving force is to motivate analysts and researchers to adopt open source software and open standardized interfaces, and to conduct reproducible research. To achieve this goal, the main contribution of this work is a Sensor Observation Service (SOS) (Na et al., 2007) client written in R (R Development Core Team, 2010) that addresses the translation of the required observation models to R data structures. We investigate if a (partial) gap closure can be done by a single piece of software as the decisive chain link in a sequence of existing tools. The term reproducible research was first proposed by Jon Claerbout (Fomel and Claerbout, 2009) to tackle the issues of current scientific publications in the domain of computational sciences, where the components necessary to recreate the presented results are completely provided. It "refers to the idea that the ultimate product of research is the paper along with access to the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results and building upon the research" (Reproducible Research Planet, 2010). The test set-up is a critical component in experimental sciences. Although in computational science one can expect that the same code and data result in the same outcome, there still is the aspect of sharing the both of these. By using tools that directly connect code and publication reproducible research does not only allow others to replicate results, but also the original authors themselves. We want to support the analysis part of reproducibility from a technical standpoint.

The remainder of this paper is structured as follows. Sections 1.1 to 1.3 present the basic concepts and components upon which the product is built and related work. Section 2 introduces a short use case whose exemplary steps are used to develop the requirements presented in section 3. Afterwards we describe the concept of sos4R in sections 4 to 5 and conclude with an evaluation (section 6). An agenda for future work is outlined in section 7. 1.1 The R project R is an emerging language and environment for statistical computing and graphics (Vance, 2009). It provides a wide range of statistical techniques and analysis functions as well as capabilities of graphical plotting that stand out and partly underlie its success. A complete description of R's functions and its user and developer communities (ranging from finance to biomedicine) is not possible here, but the most important features from the sos4R perspective are: R is freely available under an open source license; it is extensible via packages, which are small (to large) units of code and data that can be comfortably added to an R environment. The Sweave document format (Leisch, 2005) supports the concept of weaving, where the text and the analysis code are in the same document (introduced by Knuth under the term "literate programming" (Knuth, 1984)) and is particularly well suited for reproducible research as a single document generates text and results. The package cacher implements cached computations as a framework for reproducible research (Peng, 2008). By plugging in the most important data source of the OGC Sensor Web into R, a wide range of data becomes readily accessible to the large community of R users. 1.2 The Sensor Observation Service Providing data via web services supersedes the need for local file copies, which might become outdated. Also, a flexible filtering of data on the service side reduces the communication necessary to download the data. The OGC SOS is a common specification for a web service providing pullbased access to sensor observations and sensor descriptions. It is part of the Sensor Web Enablement Initiative (SWE) (Botts et al., 2008). The observations can be provided in real-time or as archived data sets. They can be subsetted in a flexible way. For example, a query like "Provide observations from water gauge Cologne_Rhine_1 for the time period from 01/01/2010 to 31/08/2010 where the water level is above 5 meters" can be

created in a common format and sent to the SOS which returns only matching observations. As the SOS always provides the observations in a common format based on Extensible Markup Language (XML)1, namely Observations & Measurements (O&M) (Cox, 2007a), clients do not have to adapt different data formats when integrating observations. An observation consists of information about the geographic feature which is observed, the time when the observation was taken, the sensor, the observed phenomenon, and the observation's result. The SOS relies upon the Sensor Model Language (SensorML) (Botts, 2007) for providing sensor metadata like the sensor's position, calibration information, and inputs and outputs of the sensing process. By connecting the powerful R environment with the SOS, a huge amount of features for analysis and visualization become readily accessible to the community of SOS users. 1.3 Related Work The OGC offers widely used service standards apart from SOS, for example Web Features Service2 (WFS), Web Coverage Service3 (WCS) and Web Map Service4 (WMS). A connection is possible in R through the rgdal (Keitt et al., 2010) package which provides bindings to the Geospatial Data Abstraction Library (GDAL)5. However, these do not (directly) support typical sensor data, like in situ measurements, but vector and raster (“image”) data. WFS could provide sensor observations as point features, but that is not intended. With this distinction, related work can be seen from different perspectives. First there are other packages in R which allow the comfortable download of near real-time data directly into R data structures. Second, there are other clients for Sensor Observation Services. For the former, one can find a few packages, for example the GEOquery package of the Bioconductor project (Davis, 2007) and WDI (ArelBundock, 2010) package, which allow access to the Gene Expression Omnibus repository of gene expression experiments, and access to the World Bank's World Development Indicators respectively. The quantmod package (Ryan, 2008) provides functions for downloading stock exchange data. However, these use proprietary interfaces for a specific data base. http://www.w3.org/XML/ http://www.opengeospatial.org/standards/wfs 3 http://www.opengeospatial.org/standards/wcs 4 http://www.opengeospatial.org/standards/wms 5 http://www.gdal.org/ 1 2

The latter perspective comprises mostly viewers of varying abilities:  simple viewers, like the OpenOOIS.org Map Viewer6 or the SOS client for OpenLayers7, which show the positions of sensors on a map and can include links to retrieve observational data;  advanced viewers, like 52°North Clients (52°North, 2010) or GeoCENS Sensor Web Browser (Liang et al., 2010), which include (three dimensional) visualization of sensors‟ positions on a map, tabular or graphic display of data;  plug-ins to geographic information systems, like ArcGIS and uDig (52°North, 2010) or gvSIG (Tamayo et al., 2009), which allow the import, subsequent processing with the respective software and most importantly combining with a large number of other data sources.

2 Application Example To illustrate the use of the sos4R package, we present the use case of a researcher modelling local weather data. She requires temperature measurements for the region and the time frame of interest. The workflow consists of the following steps: 1. Create a connection to a SOS 2. Set up query parameters 3. Request observation data 4. Analyse and visualize data with R functionalities (not part of this work). For the sake of simplicity this example does not utilize the (arbitrary) spatial, temporal and value-based querying features of the SOS. Listing 1 and Figure 1 show the complete code and the result of the analysis, a temperature plot. The code and figures are also available from http://www.nordholmen.net/sos4r/agile2011. library("sos4R"); library("xts") # 1. step weathersos = SOS("http://v-swe.uni-muenster.de:8080/WeatherSOS/sos") # 2. step station

Suggest Documents