The OGC Web Coverage Processing Service ... - Semantic Scholar

12 downloads 43851 Views 2MB Size Report
not just simple data extraction and download services, but more flexible retrieval ... ization. Of course, maintaining harmonization between the specifications ...... Consider a sample miniworld about car repair frequency ob- served by their ...
The OGC Web Coverage Processing Service (WCPS) Standard Peter Baumann Jacobs University Bremen 28759 Bremen Germany

[email protected] ABSTRACT

1.

Imagery is more and more becoming integral part of geo services. More generally, an increasing variety of sensors is generating massive amounts of data whose quantized nature frequently leads to rasterized data structures. Examples include 1-D time series, 2-D imagery, 3-D image time series and x/y/z spatial cubes, and 4-D x/y/z/t spatio-temporal cubes. The massive proliferation of such raster data through a rapidly growing number of services make open, standardized service interfaces increasingly important.

More and more, imagery is becoming integral part of geo services. In a more general perspective, a variety of remote and in situ sensors generate massive amounts of data where the quantized nature of the measurements frequently leads to a rasterized data structure. Examples include 1-D time series, 2-D imagery, 3-D image time series and x/y/z spatial cubes, and 4-D x/y/z/t spatiotemporal cubes. In parallel to the massive proliferation of such raster data through a rapidly growing number of services the demand arises for not just simple data extraction and download services, but more flexible retrieval functionality.

Geo service standardization is undertaken by the Open GeoSpatial Consortium (OGC). The core raster service standard is the Web Coverage Service (WCS) which specifies retrieval based on subsetting, scaling, and reprojection. In 2008, OGC has issued a companion standard which adds flexible, open-ended coverage processing capabilities. This Web Coverage Processing Service (WCPS) specifies a coverage processing language allowing clients to send requests of arbitrary complexity for evaluation by the server. This contribution reports on the WCPS standard by giving an introduction to its coverage model and processing language. Further, design rationales are discussed, as well as background and relation to other OGC standards. 1-D to 4-D use case scenarios illustrate intended use and benefits for different communities. Although the paper focuses on conceptual issues, the WCPS reference implementation, PetaScope, is briefly addressed. The author is co-chair of the coverage-related working groups in OGC.

Categories and Subject Descriptors E.1 [DATA STRUCTURES]: Arrays; H.2.8 [DATABASE MANAGEMENT]: Database Applications—Image databases, Scientific databases, Spatial databases and GIS

MOTIVATION

This foreseeably will soon go beyond just offering subsetting from some server-side imagery. Actually, we face the transition from mere data services to value-adding information services, including adequate data processing capabilities – in other words, we face a paradigm shift from data stewardship to service stewardship. In this contribution we adopt a standards-based perspective, introducing a recently published standard for a high-level language allowing complex server-side processing of multidimensional raster data. Standardization of open, interoperable geo services is performed by the Open GeoSpatial Consortium (OGC, www.opengeospatial.org) in collaboration with ISO, OASIS Open, W3C, and other relevant bodies. One of the historically first and still most prominent standards is the Web Map Service (WMS) Implementation Standard [9] which defines map image generation based on client-side parameter specification controlling server-side rendering (”portrayal”). While the results of such protrayal are ready for, e.g., immediate display in a Web browser, such rendered images usually are not suitable for further processing, e.g., in some analysis tool. For this purpose, the Web Coverage Service (WCS) Standard has been developed [39]: ”Unlike the WMS, which portrays spatial data to return static maps (rendered as pictures by the server), the Web Coverage Service provides available data together with their detailed descriptions; defines a rich syntax for requests against these data; and returns data with its original semantics (instead of pictures) which may be interpreted, extrapolated, etc. - and not just portrayed.” WCS basically allows retrieval based on subsetting, scaling, and reprojection. As such, WCS is the central OGC standard for simple, easy-to-use coverage services. The term ”coverages” is used by ISO and OGC to denote a ”space-varying phenomenon”, which in practice today boils down to raster data. A more

detailed discussion of the term will be given in Section 5.1. Effective December 2008, OGC has issued an extension to WCS which adds flexible, open-ended coverage processing capabilities to WCS. This Web Coverage Processing Service (WCPS) Interface Standard [2] specifies a processing language as client/server interface which can be characterized as ”SQL for coverages”. WCPS allows for manifold ad-hoc processing of coverage data, such as deriving the vegetation index, determining statistical evaluations, and generating different kinds of plots, like classification, histograms, etc. The purpose of this contribution is to present the WCPS standard to the geo and GIS community. To this end, the remainder is organized as follows. Section 2 puts WCPS into the context of OGC’s work. The main considerations and requirements that have guided WCPS development are discussed in Section 3, followed by a review of the state of the art in Section 4. Section 5, the core part, presents WCPS model and language concepts. For a practical assessment, a cross-dimensional set of use case scenarios is discussed in Section 6. Section 7 gives a summary and outlook.

2.

OGC

OGC issues a family of modular geo service standards which are accessible free of cost. While the number of individual specifications sometimes is perceived as a disadvantage, the complexity of the matter actually mandates such modularization. Of course, maintaining harmonization between the specifications represents a continuous challenge which remains on the agenda of the specification writing working groups. A possible grouping of OGC specifications runs as follows: • core services: for the classical triad of geo data – vector, raster, and metadata –, the Web Feature Service (WFS) [29], WCS [39], and Catalog Service (CS-W) [25] are provided. • value-adding services: on top of the core services, additional service specifications allow for browser-based navigation (such as WMS), processing (such as the Web Processing Service (WPS) [34]), and additional service features (such as Digital Rights Management, GeoRM [36]), to name but a few. • topical specification families, such as the Sensor Web Enablement (SWE) [7]. All specifications are based on a common architecture in support of OGC’s vision of geospatial technology and data interoperability. The Abstract Specification document series provides the conceptual foundation for OGC specification development activities. Open interfaces and protocols are built and referenced against the Abstract Specification, thus enabling interoperability between different brands and different kinds of spatial processing systems. Several of the Abstract Specification documents are adopted from the corresponding standards developed by ISO TC211. In addition to the high-level, abstract reference model the OWS Common specification [37] delineates technical corner-

Figure 1: Some of the most basic OGC services stones to which OGC standards need to adhere. For example, OWS Common mandates a GetCapabilities request type for every service which allows clients to retrieve information about data sets offered and service capabilities provided by a server implementing the standard. Further, common sets of metadata as well as canonical structures for the request XML schemas are laid down there. Figure 1 sketches the WFS, WCS, and CS-W standards suites, plus WMS. WFS-T, WCS-T, and CS-T represent so-called transactional services which allow for updating of data sets on a server. Filter Encoding (FE in Figure 1) and OWS Common Query Language (CQL) serve for predicatedbased retrieval on vector and meta data, respectively, a role which WCPS takes on for coverages. The core difference between WMS and the other interfaces mentioned is that WMS delivers portrayed feature and coverage data which are suitable for human viewing, but not for further processing; WFS, WCS, and CS-W, on the other hand, deliver data without semantic loss so that the output can be fed into further processing tools or pipelines (such as GISs). In this contribution we exclusively focus on service interface standards for coverages, which center around WCS. These are dealt with by the WCS Standards Working Group, in short: WCS.SWG, and the WCPS Working Group. In addition, the Coverages Discussion Working Group (Coverages.DWG) acts as a platform for the exchange and discussion of coverage-related topics, such as features and their formulations in forthcoming specification versions. The author co-chairs these working groups1 . One very actively contributing interest group is GALEON (Geo-Interface to Atmosphere, Land, Earth, Ocean, NetCDF) which regularly reports about implementation experience as well as their view on requirements for future WCS versions; see www.ogcnetwork.net/galeon. Historically, the WCS 1.0 specification has been the first attempt to standardize raster services. Some shortcomings perceived with respect to clarity and conciseness have been remedied in version 1.1, however at the cost of an overall harder to understand specification, as it turned out. The current version is WCS 1.1 Corrigendum 2, in short: WCS 1.1.2 [39]. Additionally, beyond the original scope of mainly remote sensing imagery meantime a lot more communities and their particular data types need to be considered, such as 1-D time series and 4-D climate simulation results. Overall, complexity tends to grow inacceptably – a phenomenon not particular to WCS, but observed with other OGC stan1 Opinions expressed in this article are those of the author, not necessarily official OGC position.

Index (NDVI) from a multi-spectral satellite image. Rather than opening a can of worms by adding an open-ended set of sometimes underspecified algorithms or functions used in different flavours by different communities the WCS group decided to develop WCPS as an extension to WCS which allows users and/or service providers to phrase a large class of operations themselves on demand. Figure 2: Possible WCS reorganization followig the core/extension paradigm

dards as well. Recently, therefore, OGC has developed the so-called core/extension paradigm which allows for a guarded modularization of specifications. It mandates that in future OGC standards shall consist of a core specification, which identifies the smallest common denominator, plus an openended set of extensions where each extension adds some welldefined feature to the core. An implementor of such a specification set must include the core and can include any subset of the extensions while respecting interdependencies stated by the extensions chosen. Currently there are two WCS extensions already existing, WCS-T and WCPS. WCS-T (where ”T” stands for transactional) extends the delivery service with a data upload facility, thereby allowing services to offer standards-based data ingest facilities [38]. WCPS we will address below. Among the possible further extensions under consideration are generalized coordinate support (to allow any combination of the spatiotemporal axes plus so-called abstract axes), factoring out fully general CRS support into an extension, support for an open-ended list of data exchange formats, and advanced interpolation methods during reprojection and scaling. In addition and as part of ongoing harmonization between WCS, WFS, SWE, and other standards, the WCS group plans to go one step further and split the core WCS into a Grid Coverages Common (tentative title), which specifies a service-independent coverage structure, and the WCS service as such. The expected benefit is that the coverage concept can be used by other service standards, independently from WCS. For example, in a future scenario a Sensor Observation Service (SOS) might deliver coverages for ingestion through a WCS-T interface feeding a database with a WCS/WCPS front-end. Figure 2 shows a possible lineup of WCS core and extensions. The next section discusses the design goals and considerations which have guided development of WCPS.

3.

REQUIREMENTS

WCS tentatively restricts itself to a few simple operations, mainly: spatio-temporal subsetting; range subsetting (in some domains also referred to as ”band selection”), reprojection, scaling, and data format encoding. This makes WCS relatively simple to implement and helps communities to rapidly get their data assets online accessible through a service complying with open standards. However, among the inputs brought into the WCS.SWG there have been several requests for functionality beyond such simple coverage access, such as deriving the Normalized Difference Vegetation

The intended use of WCPS can be summarized as navigation, extraction, and server-side analysis of large, possibly multi-dimensional coverage repositories. Navigation of coverage data requires capabilities at least like WMS (meaning subsetting, scaling, overlaying, and styling), but on objects of different dimensionalities and often without an intrinsic visual nature (such as elevation or classification data). Versatile portrayal and rendering capabilities, therefore, play a important role. Extraction and download involve tasks like retrieving satellite image bands, performing band combinations, or deriving vegetation index maps and classification; hence, they likewise require subsetting, summarization, and processing capabilities. Analysis mainly includes n-dimensional spatiotemporal statistics. In summary, a range of imaging, signal processing, and statistical operations should be expressible; this has been studied to some extent in [13]. Additionally, the language should not be too distant in its conceptual model from existing geo data processing tools (such as the ones listed in the next section) so that it is economically feasible for vendors to implement the standard as an additional layer on top of their existing products. On a side note, still such implementations obviously can differentiate in terms of performance, scalability, and other factors. Further, it should be possible for some deployed service to accept new, unanticipated request types without extra programming, in particular not on server side. The rationale behind is that both lay users and experts frequently come up with new desires, however, it is not feasible for a service provider to continuously invest into programming of new service functionality. Ideally the service interface paradigm offers open-ended expressiveness available without client-side or server-side programming. This calls for a language approach where users (or client developers) can flexibly combine existing building blocks into new functionality. From databases we learn that it is advantageous to craft such a language in a way that it is delarative and safe2 ”Safe in evaluation” in database speak means that every admissible request will terminate after finite time; the effect is that no Denial of Service (DoS) attack is possible on the level of a single request. In languages like SQL this is achieved by avoiding explicit loop and recursion constructs. Obviously this property requires a tradeoff with respect to the overall expressive power of the language. On the one hand, a large set of statistics, signal, and image processing algorithms need to be supported. On the other hand, a client must not be given unlimited power over what is executed on the server. We consciously maintain that the processing language be safe in evaluation, thereby retaining, for example, convolutions, but losing, for example, matrix inversion. 2 maybe the most prominent example of a safe and declarative language is SQL.

Declarative languages, as opposed to imperative, procedural ones, allow the user to specify what the result should look like rather than telling the system in what steps this result is computed. Declarativeness not just makes request formulation easier to the user, but also opens up free space for the server to optimize requests, i.e., rearrange evaluation for achieving the same result faster. As our experience with array databases tells that there is a wide range of effective optimization methods on coverage manipulation [40], optimizability of expressions is an important requirement. Figure 3: Excerpt from a WPS process specification Notably we do not demand minimality of the language. We will come back on this aspect in the conclusions when we can give concrete examples where minimality has adverse effects on usability. Coverage support no longer is constrained to 2-D imagery and 3-D image timeseries. Since some time now within OGC coverages are seen as 1-D to 4-D entities, and several times ”abstract”, i.e., non-spatiotemporal axes have been brought into discussion, such as atmospheric pressure. Hence, coverage expressions should allow to freely walk through the dimensions, in any combination of spatial, temporal, and abstract axes. For example, 2-D coverages with x and z axes can well occur as slicing results from 3-D or 4-D coverages. On a side note, such considerations for WCPS actually to some extent have driven generalization of the WCS coverage model. Further, the language should be semantic web ready in that coverage access and manipulation is described in a completely machine-readable form, independent from human intervention when it comes to service discovery and orchestration. Finally, given that an international standard is aiming at a large and diverse community and stands to assure semantic interoperability between heterogeneous clients and servers, a formal specification of syntax and semantics seems indispensable. Still, the resulting specification document needs to be understandable, in particular to programmers not necessarily familiar with mathematical semantics definition. While the many attempts of combining both properties in a model have shown that this seems close to impossible, a suitable compromise should be aimed at. On a side note, ease of comprehension also rules out a pure XML encoding; languages like XQuery and XPath show how compact language style can be combined with XML.

4.

STATE OF THE ART

For the design of a standardized coverage processing language suitable for use in a Web environment we have investigated into existing OGC standards, image processing, and image databases for finding a suitable basis.

4.1

WPS

The OGC Web Processing Service (WPS) is a standard which specifies a geo service interface for any sort of GIS functionality across a network [34]. A WPS may offer calculations as simple as subtracting one set of spatially referenced numbers from another (e.g., determining the difference in influenza cases between two different seasons), or as

complicated as a global climate change model. This model makes WPS especially suitable for ”webifying” legacy applications. Essentially, this XML based model specifies remote method invocation in the spirit of RPC, Corba, and Java RMI, but additionally with explicit geospatial semantics in the XML schema. As such, it brings along all the concerns of similar approaches, such as SOAP, one of them being security: a malevolent WPS implementor can implement any kind of server resource access and manipulation, without any control by the system administrator where the service ultimately is deployed. Another grave shortcoming of WPS is its low semantic level. To understand this better let us inspect an example. A server-side routine provides a function Buffer which accepts an input polygon and returns a buffered feature. In the corresponding service description which is based on the standardized WPS XML Schema (see Figure 3), function name as well as input and output parameters are described. This represents the function signature, i.e., operation syntax; looking for the semantics specification we find XML elements Title and Abstract containing human-readable text. Hence, there is no way for any automated agent to use it without human interference at some point. This has a number of serious drawbacks: • WPS consists only of a low-level syntactic framework for procedural invocation without any coverage specific operations; In other words, the processing functionality itself is not specified, hence any high-level services implemented on top of WCS per se are not standardized and interoperable3 ; • SOAP offers only syntactic service interoperability, as opposed to the semantic interoperability of WCPS; • adding any new functionality to a WPS installation requires new programming on both client and server side; • such a service cannot be detected by an automatic agent, as the semantics is not machine understandable; • for the same reason, automatic service chaining and orchestration cannot be achieved - for example, it is 3 the WPS specification already mentions that it requires specific profiles to achieve fully-automated interoperability.

unclear to an automatic agent how to connect output parameters of one processing step with the input parameters of the next step due to the missing semantics information; • with a similar argument, server-interal optimization such as dynamic load balancing is difficult at least.

Hence, WPS foresees that focused application profiles are defined based on the core specification; these profiles, then, are supposed to be crafted so as to allow for interoperability indeed. Following this approach, a WCPS application profile is currently under development in clsoe collaboration with the re-established WPS working group [6].

4.2

Image Processing

We immediately rule out computer vision and image understanding, as these disciplines work on a different semantic level than WCPS is aiming at. Further, answers generated in these domains normally are of probabilistic nature, whereas for WCPS the goal is to allow precise responses whenever possible. Many image processing languages have been proposed, such as MatLab [23], Erdas Imagine [20], and Envi [17]. These to some extent imperative languages offer a wide range of proven functionality. Matlab is generic and does not offer generic support for geo services (add-on packages accomplish that). This is different for Erdas Imagine and, in particular, for Envi which offers strong and comprehensive GIS functionality. It seems hard, though, to factor out some self-contained functionality set which is small enough for a standard but still rich enough for multi-dimensional coverage services. Moreover, these imaging systems appear not immediately suitable for very large multi-dimensional imagery, where ”very large” means Terabyte to Petabyte object sizes. Rather, they traditionally are limited to main memory sizes. Recent efforts to support ”out-of-memory processing” and swapping of image parts hint into the right direction; however, database technology deals with high-volume data since long, with remarkable success. In particular, request optimization has been studied intensively there. Consequently, we next consider image databases.

4.3

Image Databases

The requirement for further coverage processing capabilities distinguishes array databases from multimedia databases. In multimedia databases images are interpreted using some hand-picked built-in algorithm to obtain feature vectors; subsequently, search is performed on the feature vectors, not the images. The WCPS language, conversely, does not attempt to interpret the coverage data, but always operates on the cell values themselves. Hence, search results are not subject to a probability and are not depending on some random, hidden interpretation algorithm. Another distinguishing criterion, albeit on architectural level, is the potentially large amount of data which implementations of the standard need to process efficiently. Image processing systems traditionally are constrained to image sizes less than main memory available; to some extent, ”out of memory” algorithms have been designed which essentially

perform partial access and swapping of parts. A systematic approach to processing extremely large raster data volumes is addressed by the research field of array databases. Historically, the first rigorous treatment is provided by the rasdaman array algebra [4][5] which has been inspired by studying several formal models in imaging, in particular AFATL Image Algebra, a rigid formalization of image and signal processing with proven comprehensiveness and expressive power [32]. AFATL Image Algebra has been chosen as the basis for rasdaman earlier [5] because of the convergence in algebra which is the usual mathematical basis for conceptual modeling in databases. Further array algebrae are AQL [21], AML [22], and RAM [8]. They all have in common, however, that on principle they are domain-independent and deal with abstract arrays (in a programming language sense), but without intrinsic geospatiotemporal semantics - in particular, coordinate system handling is not seamlessly integrated. The only approach which has been implemented comprehensively and is in operational use under industrial conditions is rasdaman. An advantage is that rasdaman combines a rigid formal semantics on the levels of query language design, architecture description, and expression of optimization rules. Further, its implementation is proven in earth and life sciences and with multi-Terabyte databases in operational use. Therefore, the WCPS concept has strongly been influenced by the experiences made with rasdaman, but offers an intrinsic geo semantics and adapts them to the coverage model of WCS.

5.

WCPS

In this section we first introduce the WCS conceptual coverage model, then briefly summarize WCS functionality, and then discuss the main language constructs of WCPS.

5.1

Coverage Model

The term ”coverage”, in ISO and OGC definition, denotes some ”space-varying phenomenon”, i.e., a geographic object with some extent whose values depend on the location (and time) of probing. This very general definition stems from OGC Abstract Specification Topic 6 (Schema for coverage geometry and functions) [27] which is adopted from and identical with ISO 19123 [26]. It foresees several different coverage types: discrete coverages, Thiessen polygons, quadrilateral grid coverages, hexagonal grid coverages, triangulated irregular networks (TINs), and segmented curve coverages. In current practice, however, this general view boils down to discrete raster data; widening WCS scope to further coverage types listed in ISO 19123 are under consideration for future versions. Hence, it is safe to say that WCS in its current version [39] defines an open access protocol for multidimensional raster data. A coverage basically is a function which maps coordinate locations to values. It is materialized as a multi-dimensional value array, containing cells (”pixels”, ”voxels”) at the grid locations. The set of admissible coordinate values is called the coverage’s domain, which is spanned by a number of axes (or dimensions) defining the coverage’s dimensionality. For each axis, the coverage is delimited by some lower and upper bound, expressed in some coordinate reference system

(CRS). Each coverage has a list of CRSs associated in which it can be queried; requesting values in another CRS than the one in which the coverage is stored (or in the image coordinate system, directly using pixel coordinates) obviously will involve reprojection. Currently a coverage array can be of two, three, or four dimensions, containing mandatory x and y axes and optional z and time axes. For the next version it is foreseen to additionally allow so-called abstract axes with application-defined semantics (such as products offered). Coverages, then, will be allowed to have any combination of axes, including, for example, 1-D time-only sensor time series, 2-D x/z planes, or 5-D x/y/z/time/pressure cubes. The structure of a coverage’s cell values (denoting the set of all possible values associated with a cell) is given by its range. Range values can be atomic, or a list of named components called range fields (commonly known as ”bands”, ”channels”). Range fields, in turn, can be atomic or can consist of multi-dimensional arrays of values themselves4 . Additionally, a coverage may know one or more null values to denote cell values that are unknown or undefined. For scaling or reprojection performed in the course of request evaluation usually resampling and interpolation have to be applied. WCS defines the following list of standard interpolation methods, adopted from ISO 19123: nearest (neighbor), linear, quadratic, cubic, and the pseudo method none which indicates that no interpolation is admissible; further methods may be added by a particular WCS implementation. A subset of these methods is assigned individually to each coverage range field, with one of them being designated as default. During request processing a client may choose among the interpolation methods offered, or simply assume the default method. The effect of null values on interpolation can be controlled via the so-called null resistance parameter. Further, a coverage is addressed through an identifier which is unique within the server’s repository. Finally, some metadata are provided, part of which are optional.

with the GetCoverage request type. In addition, WCS foresees a DescribeCoverage request type which delivers detail information about coverages, such as extent and CRSs supported, as for transfer volume reasons in face of very large numbers of server-side coverages the GetCapabilities request essentially only lists the names of the coverages offered, but no details. The historically first (or legacy, as some voices say) protocol type for WCS requests is HTTP GET using key-value pair (KVP) notation. Alternatively, XML syntax based on XML Schema definitions is laid down in the standard, with both HTTP POST and SOAP communication being supported. GetCoverage offers a fixed set of operations which can be combined freely in a request. These operators allow for spatial, temporal, and band subsetting, scaling, reprojection, and final result packaging, including data format encoding. One GetCoverage request always addresses exactly one coverage. The following example shows a sample GetCoverage request against some satellite image time series coverage M odisCube, expressed in KVP notation:

http://myServer/wcsServlet? SERVICE=WCS & VERSION=1.1.0 & REQUEST=GetCoverage & COVERAGE=ModisCube & RANGESUBSET=nir;red SRS=EPSG:31464 & BBOX=4636000.0,5717000.0,4687000.0,5768000.0 & TIME=max & WIDTH=246 & HEIGHT=300 & DEPTH=1 & FORMAT=HDF-EOS & EXCEPTIONS=application/vnd.ogc.se_xml

The request extracts data along the given spatial bounding box, which is expressed in CRS EP SG : 31464, and fetches the most recent time slice of the cube. The result is scaled to a size of 246x300x1 pixels and delivered in the HDF-EOS format. Any eventual error is to be reported back in XML.

For formalization of the model, so-called probing functions are used. Each probing function extracts one aspect from the coverage; for example, crsSet(C) delivers a set of all CRSs in which coverage C can be addressed. Table 1 lists all probing functions. An example for rigorous use of probing functions will be given in Section 5.3.6, although for convenience we will occasionally employ them already before.

5.3

5.2

The basic request structure consists of a - possibly nested loop over a list of coverages offered by the server, an optional filter predicate, and an expression indicating the desired processing of each coverage.

5.3.1

WCS Request Types

The common request structure for OGC standards, which WCS follows, initiates client/server conversation with a GetCapabilities request to learn about a service’s offerings and capabilities. Subsequently, the client performs retrieval based on this information; in the case of WCS, this is accomplished 4 The latter feature is recognized as being relatively complex to implement and handle; hence, it is optional now and is likely to be factored out into a bespoke extension in the next WCS version.

WCPS Coverage Processing Language

Based on the coverage model presented above WCPS offers a language to retrieve information from one or more coverages stored on a server. It is a functional language, i.e., without any side effects (except for one case to be detailed below).

Processing Coverages

Let be given some n > 0, a set of variable names {$v1 , ..., $vn }5 , a list of n nonempty, not necessarily disjoint coverage identifier sets covi = {covi,j : 1 ≤ ij ≤ in for some in > 0, 5 Prefixing variables with a ”$” character is not mandatory, but used here to resemble a more XQuery-style syntax.

Coverage characteristic Identifier Grid point values Domain dimension list Domain dimension type Image CRS Domain extent of coverage expressed in Image CRS Domain extent of coverage along dimension, expressed in Image CRS CRS set extent of coverage along dimension, expressed in arbitrary CRS Range data type Range field type Range field name set Null value set Default interpolation method Interpolation method set Interpolation type Null resistance

Table 1: List of coverage probing functions Probing function Comment for some coverage C identif ier(C) For original coverages only, value(C, p) coverage cells, of data type rangeT ype(C) ∀p ∈ imageCrsDomain(C) dimensionList(C) List of all of the coverages axis names, in their proper sequence dimensionT ype(C, a) Dimension type ∀a ∈ dimensionList(C) imageCRS(C) Image CRS, allowing direct array addressing imageCrsDomain(C) Extent of the coverage in (integer) grid coordinates, relative to the coverage’s Image CRS; essentially, the set of all point coordinates imageCrsDomain(C, a) Extent of the coverage in (integer) ∀a ∈ dimensionList(C) grid coordinates, relative to the coverages Image CRS, for a given dimension; essentially, the set of all values inside the extent interval crsSet(C, a) Set of all CRSs from the supported CRS ∀a ∈ dimensionList(C) domain(C, a, c) domain of the coverage, expressed in one of ∀a ∈ dimensionList(C), its CRSs, for a given (spatial, temporal, c ∈ crsSet(C) or abstract) dimension rangeT ype(C) The data type of the coverage’s grid point rangeT ype(C) values rangeF ieldT ype(C, f ) The data type of one ∀f ∈ rangeF ieldN ames(C) coverage range field rangeF ieldN ames(C) Set of all of the coverage’s range fields names nullSet(C, r) The set of all values that represent ∀r ∈ rangeT ype(C) null as coverage range field value interpolationDef ault(C, r) Default interpolation method, ∀r ∈ rangeT ype(C) per coverage field interpolationSet(C, r) All interpolation methods applicable to the ∀r ∈ rangeT ype(C) particular coverage range field; must list at least the default interpolation method interpolationT ype(im) Interpolation type of a ∀im ∈ interpolationList(C) particular interpolation method nullResistance(im) Null resistance level of a ∀im ∈ interpolationList(C) particular interpolation method

covi,j identifier of some coverage known by the server} for 1 ≤ i ≤ n, a predicate B, and a processing expression P which both may contain occurrences of variable names vi . Then, a WCPS request has the general format:

for $v1 in ( cov_1_1, cov_1_2, ... ), ..., $vn in ( cov_n_1, cov_n_2, ... ), [ where B( $v1, $v2, ... ) ] return P( $v1, $v2, ... )

The return clause is evaluated for each variable combination, unless predicate B evaluates to f alse in which case P is skipped and the current variable combination does not con-

tribute to the overall response. The result of such a request is a (possibly empty) list of items. The type of expression P determines the overall response structure. A scalar-valued expression leads to a list of result values. The following example returns a single scalar representing the maximum value occurring in elevation coverage Elevation. for $e in ( Elevation ) return max( $e ) The result, a single floating point number, will be encoded in XML for transfer to the client. Coverage-valued expressions are treated differently. Let us

assume that expression C evaluates to a coverage, as discussed in the later sections, and f is the name of a suitable coverage data exchange format. Then, P has one of the two forms

encode( C, f ) store( encode( C, f ) ) In the first case, the response is a list of encoded coverages; in the second case, the encoding results are stored serverside and URLs for download are passed back to the client instead. The store() function with its side effect is the only exception to the functional style of WCPS. Example: ”The difference between red and near-infrared channel in coverage M odisScene, encoded in TIFF and stored on the server for later fetching”:

over values

x $i (0,255), y $j (0,255) (char) ($i+$j)/2

Note that the sequence in which axes are indicated in an expression is completely independent from the sequence of linearization as stored on the server (such as row-major order). This is part of WCPS’s hiding of storage internals.

5.3.3

Condensers

This operation class, which is similar to SQL aggregates, consolidates the grid point values of a coverage along selected axes into a scalar value based on the condensing operation indicated.

for $m in ( ModisScene ) return store( encode( abs( $m.red - $m.nir ), "TIFF" ) )

Let be given some operation op ∈ {+, ∗, max, min, and, or}, n > 0, a set {$n1 , ..., $nn } of axis names, a set {t1 , ..., tn } of axis types, a set of pairs {(l1 , h1 ), ..., (ln , hn )} with li < hi for i ∈ 1, ..., n, a boolean expression p which may contain occurrences of $n1 , ..., $nn , and an expression e which may contain occurrences of $ni and evaluates to a scalar value. Then, the syntax for the condenser is as follows:

Next, we take a closer look at the operational capabilities of the processing expression. At the core of the language are basic operations for constructing coverages and summarization over coverages. Further operation classes include convenience shorthands and special operations like scaling and reprojection.

condense op over t1 $n1 (l1,h1), ..., tn $nn (ln,hn) [ where p ] using e

5.3.2

Coverage Constructor

The coverage constructor expression allows to create a ddimensional coverage and assign values to its cells. The domain definition consists, for each dimension, of a unique axis name plus lower and upper bound of the coverage, expressed in a fixed image CRS and using integer coordinates. No other CRS is supported initially, however, the setter function setCRS() allows adding a further supported CRS. The coverage’s content is defined by a general expression with some scalar result type, which at the same time will determine the result range type. Let be given a unique coverage name f , some n > 0, a set t1 , .., tn of axis types, a set $n1 , ..., $nn of names, a set of pairs (l1 , h1 ), ..., (ln , hn ) with li < hi for i ∈ 1, ..., n, and an expression e which may contain occurrences of the $ni and evaluates to a scalar value. Then, the syntax for the constructor is as follows: coverage F over t1 $n1 (l1,h1), ..., tn $nn (ln,hn) values e For example, a 2-D greyscale image aligned in the x/y plane containing a diagonal shade can be written as below: coverage Greyshade

The operator iterates over the given domain while combining the result values of e through operator op. The where clause allows to exclude cells based on their coordinates as well as their values. The following expression delivers the sum of all values’ absolutes inside the x/y bounding box (0,0)/(99,99) of coverage expression C: condense + over x $i (0,99), y $j (0,99) using abs( C[x($i),y($j)] )

5.3.4

Shorthands

The previously introduced operations allow to compose a wide range of operations, however sometimes with quite some syntactic burden. To make simple things simple shorthands are introduced for important operation classes. Subsetting can be subdivided into sectioning and slicing. A section operation receives a coverage, an axis, and an interval on this axis. This interval will determine the new coverage’s extent along this axis. Hence, the coverage’s extent is reduced while the dimension remains the same. Slicing, on the contrary, reduces dimensionality. This operation extracts a spatial slice (i.e., a hyperplane) from a given coverage expression along its axes, specified by one or more slicing axes and a slicing position thereon.

By default, the coverage’s image CRS is used for addressing. However, a qualifier with each axis may change this to express location in some other supported CRS. An example we will encounter in Section 6.2. Example: The following expression subsets 4-D x/y/z/time coverage expression C by cutting out an interval from 100 to 200 along the x axis and slicing at z position 42; the result is a 3-D x/y/t coverage: C[ x(100,200), z(42) ] For an assumed y extent of (y0 , y1 ), the coverage constructor expression equivalent to the above shorthand is

coverage Slice over x $cx (100,200), y $cy (y0,y1) values C[ x($cx), y($cy), z(42) ] Induced operations lift operations available on the range type to coverage level by applying them simultaneously to all cells of a given coverage. We abbreviate the corresponding marray expression by the base operation. For example, cell-wise addition of coverage expressions C and D with same extent as before is written as:

C + D The usual arithmetic, boolean, logarithmic, and trigonometric operations are supported as induced operations, likewise record access and type cast known from programming languages. Example: The expression below evaluates to a boolean coverage:

( C.red + D.red ) > 127 Likewise common condensers can be abbreviated. If iteration uniformly goes over all cells of a coverage, and the expression evaluated at each location is based only on the cell value, but not its coordinate values, then a shorthand operation can be applied. Example: ”The maximum value in the temperature variable of coverage ClimateRun”: max( ClimateRun.temperature ) Finally, some specification is needed as to what happens if one of the operands contains null values. Whenever a cell value is encountered which is listed in one of its coverage’s null value set then the result of the value combination will be set to one of these null values (the default null if defined). In

Figure 4: 3x3 Sobel edge detection filter kernel case of a binary operation, the situation is more complicated. If one operand has a null value as per the coverage’s null value set then the overall result will be null if there is some null value available in the intersection of both participating coverage’s null value sets. If this is not the case, then an exception will be thrown. For filter predicates in the where clause we decided to adopt a rigid approach: a boolean null value will be interpreted as f alse, thereby effectively dropping the element on hand from the result list. The Sobel filter, a well-known edge detector, may serve as a final, more complex example. Let the kernel be given by the matrix shown in Figure 4; for simplicity we assume it is stored as a coverage as well, named Kernel3x3. Then, the following expression returns a coverage with same extent as the original one, but values replaced by the edge detector. for $img in ( Image ), $k in ( Kernel3x3 ) return encode( coverage filteredImage over x $ix( imageCrsDomain( $img, y $iy( imageCrsDomain( $img, values ( condense + over x $fx( -1, +1 ), y $fy( -1, +1 ) using $img[ x($ix+$fx), * $k[ $fx, $fy ] + condense + over x $fx( -1, +1 ), y $fy( -1, +1 ) using $img[ x($ix+$fx), * $k[ $fx, $fy ] ) / 9, "png" )

5.3.5

x ), y )

y($iy+$fy) ]

y($y+$fy) ]

Further Operations

In addition to the abovementioned operations there are further ones for scaling and reprojection. We omit discussion for the sake of brevity, as they anyway provide the standard mimics, but provide a scaling example in Section 6.2. All operations can be nested arbitrarily as long as data types match. For convenience, type cohesion and extension as known from programming languages is provided. Parenthesing and implicit precedence rules are available for

• ∀p ∈ imageCrsDomain(C3 ) : value(C3 , p) = value(C1 , p) > value(C2 , p) (”value is given by performing operation cellwise.”) • dimensionList(C3 ) = dimensionList(C1 ) • ∀a ∈ dimensionList(C3 ) : crsSet(C3 , a) = crsSet(C1, a) Figure 5: Edge detector as a filter kernel example

the syntax representation as used above, but obviously not needed for the XML expression encoding.

5.3.6

Semantics Specification

The specification approach for WCPS can be characterized as semi-formal: a fixed framework is followed which lends itself towards usual semantics specification, however, it resorts to informal description in cases where a formalization would constitute an inappropriate burden while the concepts are well known in the GIS community anyway. The semantics of each operation is defined through its precondition (such as only positive cells when applying a logarithm) and postcondition. The previously introduced probing functions serve to describe the operation postcondition. Similar to algebraic specification of Abstract Data Types, the effect of applying an operation to a coverage expression is described by applying every probing function to the resulting coverage. We illustrate the semantics definition by means of the language element binaryInducedExpr, i.e., the binary induced comparison of values; it belongs to the class of coverageExprs. Specification relies on the probing functions introduced earlier which we present below (comments in the table have been added for this article). Let C1 , C2 be coverageExprs where imageCrsDomain(C1 , a) = imageCrsDomain(C2 , a), imageCrs(C1 , a) = imageCrs(C2 , a), domain(C1 , a) = domain(C2 , a), ∀a ∈ dimensionList(C2 ) : crsSet(C1 , a) = crsSet(C2 , a), rangeF ieldN ames(C1 ) = rangeF ieldN ames(C2 ), ∀f ∈ rangeF ieldN ames(C1 ) : rangeT ype(C1 , f ) is cast-compatible6 with rangeT ype(C2 , f ) or rangeT ype(C2 , f ) is cast-compatible with rangeT ype(C1 , f ). Then, for any coverageExpr C3 of structure C1 > C2 the semantics of C3 is defined as follows: • identif ier(C3 ) = ”” (”derived coverage has no name - it is not stored and, hence, inaccessible by name.”) 6

see [2], Section 7.2.5

• ∀a ∈ dimensionList(C3 ) : dimensionT ype(C2 , a) = dimensionT ype(C1 , a) • imageCrs(C3 ) = imageCrs(C1 ) ∩ crsSet(C2 ) (”CRSs supported are the ones which boht input coverages share”) • imageCrsDomain(C3 ) = imageCrsDomain(C1 ) • ∀a ∈ dimensionList(C3 ), c ∈ crsSet(C3 , a) : domain(C3 , a, c) = domain(C1 , a, c) (”extent is that of input coverages for each axis and in each of its CRSs”) • ∀r ∈ rangeF ieldN ames(C3 ) : rangeF ieldT ype(C3 , r) = Boolean (”for all range fields: result cell type is Boolean”) • ∀r ∈ rangeF ieldN ames(C3 ) : nullSet(C3 ) = {} (”for all range fields: result has no null values”) • ∀r ∈ rangeF ieldN ames(C3 ) : interpolationDef ault(C3 , r) = none, interpolationSet(C3 , r) = {none} (”result coverage does not allow interpolation”) Essentially, this specification says that the resulting coverage has the same extent as the original coverages, is addressable in only those CRSs supported by both input coverages, and its values are derived from cellwise comparison. Some constituents are not set, such as identifier and applicable interpolation methods; setter functions exist which can change these subsequently, for example, to set the identifier of the expression to ComparisonResult: setIdentifier( C > D, "ComparisonResult" ) Note that this does not lead to a server-side storage and subsequent accessibility; it merely changes metadata transferred to the client as part of the overall response. Setting the identifier might make sense when the coverage result is reinserted into some (same or different) server during a WCS-T upload [38].

5.4

The WCPS Reference Implementation

In the PetaScope project, Jacobs University is undertaking the reference implementation of WCPS. PetaScope consists of a service stack as shown in Fig. 6. A Java servlet accepts XML requests, which must conform to the WCPS schema, and returns coverage results. Coverage results are returned as multipart HTTP response containing an XML document (the so-called ”manifest”) holding the metadata and one or more files holding the binary coverage data in the requested encoding format.

Figure 7: Search across a set of timeseries (source: EarthLook, www.earthlook.org )

Figure 6: WCPS reference implementation architecture

The service uses the array database system rasdaman as its backend, as rasdaman is already capable of storing and querying multi-dimensional raster data over any C/C++ cell type [3]. The WCPS web service component translates a WCPS request into the rasdaman query language, rasql [28], and hands this to rasdaman for processing. The results obtained from rasdaman are MIME-encoded and shipped back to the client, together with the XML-encoded manifest describing them. Rasdaman utilizes a relational DBMS as its persistent storage layer. Large arrays are partitioned into smaller ones, so-called tiles, which then go into one BLOB (Binary Large OBject, i.e.: a byte string maintained in the database) each. The WCPS component itself additionally stores metadata information about the coverages that it serves. In PetaScope, rasdaman makes use of the PostgreSQL open-source DBMS to physically store data and metadata. As performance evaluations are not yet available, only preliminary observations can be made. The translation from the WCPS request into a rasql query appears to take only a few milliseconds. Rasdaman, which performs the main workload in the end, has been benchmarked, e.g., in [40, 5, 31, 10, 1]. Upon sufficient completion, the source code will be made available under a free license at www.petascope.org. The package eventually will consist of a comprehensive WCS suite, offering WCS, WCPS, and WCS-T.

6.

SAMPLE USE CASE SCENARIOS

OGC standards stand out in that they are thoroughly evaluated practicality, usability, and adequateness before official release. Central concepts of WCPS have proven successful in rasdaman during its many years of operational use. Further practical assessment has been performed using PetaScope; some of the use cases inspected are now publicly accessible through www.earthlook.org, including a sandbox for handson experimenting on sample data sets.

In this Section we discuss some hand-picked application use cases addressing both typical current and future expected scenarios. Among the fields recently brought into the WCS working group are areas as diverse as sensors (in the broadest sense of the term), exploration, atmospheric and hydrospheric modeling, environmental monitoring, marine biology, biodiversity, and aerosol chemistry; certainly this list is by no means representative nor exhaustive.

6.1

1-D Sensor Time Series

One-dimensional time series form a kind of coverage which only recently has received attention by the WCS group. In particular this was induced by harmonization work with the Sensor Web Enablement developers where on the one hand sensor timeseries obviously play a central role, and on the other hand data structures are grounding on WCS when it comes to coverages. The first use case searches within a given time series to flag whenever a threshold T is exceeded. The WCPS below returns a standardized time series, encoded as commaseparated values (CSV), with value true whenever threshold T is exceeded, and f alse otherwise. for $ts in ( TimeSeries ) return encode( ( $ts > T ), "csv") For the second use case, the following request picks only those time series objects where the difference between maximum and minimum value is below threshold T (Figure 7). for $ts in ( TimeSeries_1, ..., TimeSeries_n ) where abs( max($ts) - min($ts) ) < T return identifier( $ts ) The response in this case consists of a list of (locally unique) coverage names, hence no encoding needs to be applied. In some disaster mitigation scenario it might be of interest to quickly learn about status changes. A simple standing query like the one below can deliver, for any time T , the cumulative average, for example:

Figure 8: Alerter functionality implemented through standing queries in a browser (source: EarthLook)

for $ts in ( TimeSeries ) return $ts[ time( imageCrsDomain( $ts, T ) ] for $ts in ( TimeSeries ) return avg( $ts )

Figure 8 shows how these requests are used in an alerter script where client-side Javascript is used to continuously resend the request and color the result according to some threshold value.

6.2

Figure 9: Browser-based WMS navigation (source: EarthLook)

scale( $a[ x:"urn:ogc:def:crs:EPSG:4326" (-97.105,-78.794), y:"urn:ogc:def:crs:EPSG:4326" (24.913,36.358) ], { x(0:559), y(0:349) }, {} ), "png"

2-D Web Map Service

The OGC Web Map Service (WMS) Implementation Standard [9] is the most widely used OGC standard, probably due to the user friendliness of the interactive clients which can be built on top of this client/server protocol. WMS provides the basis for what has been termed Web-GIS functionality: via their Web browser users can navigate a map dynamically composed of different layers, with each layer rendered according to some chosen style definition. Usually on client side some interactive client accomplishes convenient map navigation, such as interactive zoom and pan. To show versatility of WCPS we show that WMS-type queries can be expressed in it. The following is a typical WMS GetMap request in KVP syntax, taken from [9]:

http://a-map-co.com/mapserver.cgi? VERSION=1.2.0 & REQUEST=GetMap & CRS=CRS:4326 & BBOX=-97.105,24.913,-78.794,36.358 & WIDTH=560 & HEIGHT=350 & LAYERS=AVHRR_09_27 & STYLES= & FORMAT=image/png

The request accesses layer AHRR-09-27 and retrieves a cutout given by bounding box (-97.105,24.913,-78.794,36.358) expressed in the coordinate reference system (CRS) identified by EPSG code 4326 and using OGC’s URN-style syntax [39]. As no style is specified, the default will be applied. The resulting image is scaled to size 560x350 and then delivered in PNG format. Assuming that the AVHRR coverage is already stored as a color image such a request can be formulated in WCPS immediately:

for $a in ( AVHRR_09_27 ) return encode(

) The expression is best understood by walking it inside out. The coverage, represented by variable c, is subset with the coordinates indicated for each axis, along with the CRS in which coordinates are expressed. Next, the resulting image is scaled to the x and y extent indicated; the lower bound is set to 0 here, but could be any integer value. The third scaling parameter allows to indicate the interpolation method to be applied. As WMS does not allow to state such details, the list is left empty meaning that the server will apply the default interpolation. Finally, the result is encoded in the PNG format. This request translation technique is used successfully in rasgeo, the rasdaman WMS, since many years. On the EarthLook website, www.earthlook.org, several demonstration WMS instances are provided, some of which are based on WCPS, and some on the rasdaman query language, rasql. All services ultimately maintain their data via rasdaman. Figure 9 shows a screenshot using the rasdaman WMS client. Finally we consider deriving summary data from maps. While this is not within the range of WMS it is indeed relevant for imaging and GIS data analysis. The following WCPS code derives the histogram for an 8-bit greyscale satellite image channel:

for $ls in ( LandsatScene ) return

Figure 11: 2-D and 3-D Slices from a 4-D ECHAM T42 climate data set (horizontal wind speed)

over values Figure 10: DFD-DLR WCS demonstration service with 1-D to 3-D extraction results encode( coverage LandsatRedHistogram over abstract $n( 0, 255 ) values count( $ls.red = $n ), "csv" )

The induced comparison ls.red = n establishes a boolean matrix with a value of true iff the red band’s intensity values correspond to the current bucket number, n. The count operator inspects this matrix and counts the occurrences of true. The results are cast into a new coverage which is 1D over an abstract dimension running from 0 to 255. An appropriate data format for shipping this 1-D coverage is CSV.

6.3

3-D Remote Sensing Time Series

In an early WCS experiment a 3-D satellite image time series has been established based on rasdaman and Oracle by the Remote Sensing Data Center (DFD) of the German Aerospace Agency (DLR) [11]. IDL on the Net has been used for the building the Web interface. AVHRR imagery representing land / sea surface temperature has been mosaicked into a map of Europe and the Mediterranean, and then has been extended into time for an interval of several years. Altogether, the database consist of about 10,000 AVHRR images collated into one x/y/t data cube. Figure 10 shows the Web interface together with some retrieval results. Users can draw a bounding box for spatial selection and additionally indicate time intervals. The result, then, is a 3-D subcube. Alternatively, 2-D time slices can be extracted and 1-D drill-through time series. For example, the middle-right image shows the temperature curve over Moscow for one year. In this use case it might be advantageous to not only select the cell identified by the coordinate location but to average over some region to eliminate atmospheric distortions and other potential effects. The following WCPS request constructs the temperature time series by averaging over a 3x3 area around the chosen location x0/y0 for time interval t0 to t1:

for $a in ( AVHRR_cube ) return encode( coverage TemperatureTimeSeries

time $t(t0,t1) avg( $a[ x( x0-1, x0+1 ), y( y0-1, y0+1 ), time( $t ) ] ),

"csv" ) Note that the data volume shipped over the net is about 1 kB, in contrast to the 10,000 image data cube. By expressing the user’s need concisely the data volume can be reduced, as will also be discussed in the next section. The image bottom-right shows the Normalized Difference Vegetation Index (NDVI). Rasdaman would have allowed to derive this from suitable satellite data (such as the Landsat instruments), but WCS doesn’t offer such processing capability. In WCPS the NDVI extraction from near-infrared and red Landsat channels can be phrased as follows:

for $lm in ( LandsatMosaic ) return encode( ( $lm.nir - $lm.red ) / ( $lm.nir + $lm.red ), "tiff" )

6.4

4-D climate

The four-dimensional use case is chosen from climate modeling. Query-based access to multi-dimensional earth science data has been investigated earlier in the EU-funded ESTEDI project; see also [3]. The ECHAM T42 model is used for atmospheric simulation. It generates relatively low-resolution data with a spatial resolution of 128 x 64 cells for the complete earth surface. Temporal resolution is 24 minutes per time slice. Over a simulation period of 200 years this accumulates to roughly 2 million slices, corresponding to approximately 2.5 TB. This holds for one physical parameter (”variable”), such as temperature, wind speed in x and y direction, CO2 concentration, etc.; up to and over 50 variables can occur. Figure 11 shows 2-D and 3-D slices of the x wind speed component obtained from a ECHAM T42 model run. Interestingly it has been observed that users (in this case mostly: scientists) download by a factor of 10 too many data, as compared to what they actually need [19]. This results from unwieldy FTP archives where users have to find their way through large files which they have to download, followed by writing own code for extracting the pieces of

interest. Conversely, this means that by offering extraction and preprocessing capabilities on an adequate semantic level bandwidth usage and transfer times potentially can be reduced by a factor of 10, not to speak of the enhanced quality of service. Again, we discuss some typical operations on ECHAM T42like data sets. The first use case requests wind speed in x direction at location x0/y0, expressed in CRS EPSG:4326 at height 0 over ground for time interval t0 to t1. This obviously returns a 1-D time series. An appropriate format for delivering such values is CSV. for $e in ( ECHAM_T42 ) return encode( $e.windspeedX [ x:"CRS:4326"( x0 ), y:"CRS:4326"( y0 ), z( 0 ), time( t0, t1 ) ], "csv" ) Note that the syntax does not prescribe evaluation sequence; based on provable semantic equivalence a server can decide whether it first performs the subsetting (which is better in face of a voxel-interleaved storage) or the temperature component extraction (which yields faster results with bandinterleaved storage). The next use case asks for the average temperature at ground level for all time slices. Again, the result is a one-dimensional time series of float values. for $e in ( ECHAM_T42 ) return encode( coverage AverageTemperature over time $t (imageCrsDomain(c,time)) values avg( $e[ time( $t ) ] ), "csv" ) This time we need a coverage constructor because for each time step some processing is to be applied, for which a time position variable is required. The coverage clause generates a 1-D time series by iterating over the time axis of ECHAM T 42. Note the typing of the domain axis which allows the resulting coverage lateron to know about the semantics of its axis. For each slice the avg operation summarizes its values.

6.5

Figure 12: Sample aggregated view of car repair data

Figure 13: ME/R schema of sample OLAP cube

spatio-temporal semantics we choose an data warehousing / OLAP scenario. Following the classical definition of [16] a data warehouse is a topical, time-aware excerpt from one or more operative databases. Usually a data warehouse is organized as a so-called data cube where dimensions, defined by measures, span a data space in which facts sit. Typically a data cube has between three and twelve dimensions of which one usually is time, as aspects of enterprise behavior over time are modelled. Consider a sample miniworld about car repair frequency observed by their garage visits [33]. An event is a garage visit, identified by vehicle, customer, date, and garage. Each such events is additionally described by repair costs, number of garage employees involved, and the duration of the repair. Figure 12 shows an aggregated tabular view of such events. For the modeling we use the graphical ME/R notation introduced by Sapia [33] as shown in Figure 13 and 14. The

Data Warehousing

Suitability of WCPS for OLAP-style queries is an important prerequisite for cross-domain services where statistical (such as business) data are merged with geospatial coverage data (such as remote sensing imagery). To demonstrate the capabilities of WCPS to model multi-dimensional data beyond

Figure 14: ME/R schema, with dimension hierarchies

data cube, named repair, has measures vehicle, customer, garage (resp. their identifiers), and date of arrival. Each fact has attribute values cost, number of employees involved, and repair duration. The resulting structure is termed a star schema in case of a single cube like in our example. In presence of several cubes the structures arising have been described as snowflake and galaxy schemes; see, e.g., [24] where several further variants and extensions have been proposed in addition. Among the common operations on such cubes is aggregation along a multitude of different criteria. Such aggregation is defined on the dimensions by dimension hierarchies which offer stepwise coarser views on the data. Users operate with spreadsheets as frontend which generate queries and return the results in tabular or graphical view. We pick two typical OLAP query types, slicing and roll-up. The slicing query asks for a list of all repair events of vehicle brand B in garage G within the last ten days. Assuming T as the maximum time coordinate we obtain: for $cube in ( RepairCube ) return encode( $cube[ vehicle( B ), garage( G ), time( T-10, T ) ], "csv" ) The roll-up scenario requests a summarization, the number of repairs per garage and year; the result is a 2-D cube with remaining dimensions garage and time. For syntactic simplification we assume a cube extent of t0 to t1 for the time dimension, g0 to g1 for garage, v0 to v1 for vehicle, and c0 to c1 for customer, resp. For simplicity we construct a new abstract axis for the years, instead of using the normal time axis. Again, the result is returned in CSV. for $cube in ( RepairCube ) return encode( coverage RollupCube over abstract $g(g0,g1), abstract $y(t0:t1/365) values condense + over day $d(0,364), vehicle $v(v0,v1), customer $c(c0,c1) using $cube[ time( $d+($y-t0)*365 ), garage( $g ), vehicle( $v ), customer( $c ) ], "csv" )

Obviously this operation is structurally close to a scaling along one dimension using linear interpolation. Actually, on a side note we claim that there is much similarity between OLAP and spatio-temporal raster data. This gave rise for one research strand of ours where we work on extending the OLAP concept of dimension hierarchies in a way suitable also for geospatial semantics. The expected benefit is not only in the conceptual unification, but possibly also in new internal optimization techniques. For example, one of our research activities investigates on applying OLAP preaggregation to multi-dimensional raster images with the aim of extending the concept of image pyramids in a manner suitable for more than two dimensions [13].

7.

CONCLUSION AND OUTLOOK

We presented the Web Coverage Processing Service as the new OGC standard for flexible, high-level coverage processing services. WCPS has been approved by OGC as an official standard in December 2008. It bridges WCPS, which it extends with coverage processing, and WPS, which it extends with a well-defined processing semantics. Both benefit from the flexibility of allowing ad-hoc formulation of complex requests without any server or client side code recompilation. In the requirements analysis we mentioned that minimality is not among our goals, but without further justification. In retrospect this can be detailed now. A minimal operational set would abandon the trim and slice shorthand, all the induced operations, and the condensers. For example, the request for $c in ( MyCoverage ) return all( $c[ x(0:99), y(0:99) ] > 127 ) can be expressed as for $c in ( MyCoverage ) return condense and over cx x(0:99), cy y(0:99) using (coverage MyBand over dx x(0:99), dy y(0:99) values $c[ x(dx), y(dy) ] > 127 )[ x(cx), y(cy) ] The second phrasing obviously is not just three times longer (9 lines versus 3), but also much more error prone. We believe that such ”syntactic sugar” is beneficial for code developers, following the old rule ”code lines which don’t exist cannot contain errors”. Additionally, as WCPS code usually will be generated by tools writing these tools needs to be straightforward and intuitive to minimize programming errors which are hard to detect lateron. The duplicate explicit coordinate addressing in the condense and trim operation above is a nice example for this. But there is more to it: many of the ”shorthand” operations are particularly well to optimize - in other words, the particular syntax is

a kind of optimizing hint to the server. Explicit index addressing as above requires costly access operations, while in the compact formulation indexing is left to the system. This makes the rasdaman optimizer switch to a strategy of simply inspecting each cell in turn by iterating linearly over each tile. Cell access, then, effectively is reduced from evaluating a Horner scheme with several additions and multiplications to a simple pointer increment. Finally, some operations are generic enough with respect to domain and range to allow for creating libraries of generic operations. For example, the short query version above is completely agnostic to dimension and extent of the input coverages. In summary, from a practical viewpoint many reasons speak against minimalistic language design, although the underlying model of course should be minimal in its basic concepts. Further, optimization and intelligent orchestration is on the research agenda. Optimization has proven highly effective for speeding up coverage access and processing. Concerning storage optimization, adaptive tiling [12] and compression [10] turn out advantageous. Transparent integration of tertiary storage with emphasis on spatial clustering in tape cabinets has been investigated in [30]. As for processing optimization, several techniques have been shown to speed up response times. Request rewriting exploits algebraic equivalences to substitute query fragments by semantically equivalent fragments which execute faster; in [31] 150 algebraic equivalence rules have been developed, of which 40 are used in the rasdaman system for achieving a canonical query representation and 110 are optimizing. Parallel request processing in distributed environments has been implemented and tested in a Beowulf cluster [15]. Transposing OLAP preaggregation to imagery for fast multi-dimensional scaling and summarization is among our ongoing research [14]. Recently, we have started to study just-in-time compilation, with promising first results [18][35]. A research project just launched investigates into automatic orchestration and service dispatching. This requires dynamic analysis of the request and comparing against resources available. We intend to use cost-based models for distributed processing in heterogeneous environments. On conceptual level, the language is planned to be extended with manipulation functionality - if the current return clause corresponds to SQL’s select then the equivalents to QL’s insert, update, and delete are useful constructs. For example, an application may want to update part of a map by replacing an area given by a a bounding box, a bounding polygon, or a mask. While the WPS model as such was not found suitable for a tight semantic coupling between client and server WCPS, meantime work has started towards a WPCS protocol embedding into the WPS framework. Formally, this is foreseen to become a WPS Applicaton Profile [6]. The WPS processing signature for WCPS is defined such that the input is a WCPS expression in string or XML representation and the output is a set of either coverages or scalar, XML encoded values. Hence, the features of both standards augment each other: • WPS supports any kind of geo processing, whereas

WCPS focuses on coverage processing; • WPS consists only of a low-level framework for procedural invocation, whereas WCPS gives a high-level, concrete, and concise service specification; • WPS specifies static services, whereas WCPS provides the flexibility of dynamic ad-hoc query formulation; in other words, WPS extension requires client and server side programming, whereas with WCPS this means composing a new string on client side, without any change to the server; • WCPS allows phrasing of analytically expressible algorithms; WPS, on the other hand, by definition is Turing complete; • As experience shows, WCPS offers a high potential for automatic chaining and optimization; WPS, on the other hand, typically requires manual server-side intervention, such as code tuning in supercomputing centers. Hence, any tool implementor and, subsequently, service provider can choose between WPS’s syntactic interoperability and WCPS’s semantic interoperability. Large systems with algorithms too complex to be described analytically (such as climate simulations) or legacy systems best use WPS; when flexibility, high-level semantics, and scalability in local or distributed environments specifically for coverage data are at stake - such as in decision support - then WCPS offers a suitable interface. Our experience shows that such a kind of service does not compete with imaging packages. Rather, there is an advantage in combining both: a WCPS-based server can perform data extraction and reduction through its preprocessing capabilities, say, reducing data size from an overall several Terabytes to several 100 Megabytes; a client-side specialpurpose data analysis tool, then, can undertake further analysis on the extracted data. An example where such a combination has proven useful is the WCS 3-D timeseries use case presented earlier; in this case the combination consisted of rasdaman as raster database and IDL on the Net for image processing and Web serving; similar experiences have been gathered with a Khoros coupling. In summary, today navigational interfaces for large coverage archives are emerging already; the next step will consist of advancing from coverage data stewardship to service stewardship based on open, flexible access interfaces for valueadding processing, analysis, and mining. Application examples are manifold: Sensor and streaming databases will allow data subsetting, on-demand processing and summarisation, as well as standing queries for alerting. Hyperspectral satellite imagery will not just be served as is, but derived products like vegetation index or snow index will be computed on the fly and without redundant storage. Human brain imaging will benefit from analyzing thousands of brain activity maps simultaneously. Multi-Petabyte statistical datacubes can be leveraged for online analysis. This obviously poses new challenges on the design of open, interoperable services and their efficient implementation. WCPS is OGC’s flexible, unified interface for semantic coverage services within and across domains.

8.

ACKNOWLEDGEMENT

The author gratefully acknowledges is indebted to Arliss Whiteside, with whom he co-chairs the WCS.SWG. Steven Keens, with whom the author co-chairs the WCS.SWG, and Arliss Whiteside have contributed substantial suggestions for improvement during their proofreading of the WCPS draft. Ben Domenico continuously provides invaluable input, discussion, and insight as initiator and leader of the GALEON network. A big ”thank you” goes to the rasdaFolks for their great work in implementing rasdaman, PetaScope, and EarthLook. The reviewers’ insightful comments have allowed to significantly improve the paper.

9.

REFERENCES

[1] Intra-query parallelism for multidimensional array data. In 28th International Conference on Very Large Data Bases (VLDB) 2002, August 20, 2002. [2] P. Baumann, editor. Web Coverage Processing Service (WCPS) Implementation Specification. Number 08-068. OGC, 2008. [3] P. Baumann. Large-scale raster services: A case for databases (invited keynote). In 3rd Intl Workshop on Conceptual Modeling for Geographic Information Systems (CoMoGIS), volume Lecture Notes on Computer Science 4231, pages 75 – 84. Springer, 6 - 9 November 2006. [4] P. Baumann. Language support for raster image manipulation in databases. In Proc. Int. Workshop on Graphics Modeling, Visualization in Science and Technology, April 13 - 14, 1992. [5] P. Baumann. A database array algebra for spatio-temporal data and beyond. In Proc. 4th International Workshop on Next Generation Information Technologies and Systems (NGITS ’99), volume Lecture Notes on Computer Science 1649, pages 76 – 93. Springer Verlag, July 5-7, 1999. [6] P. Baumann and M. Owonibi, editors. Web Processing Service (WPS) Application Profile Extension for Web Coverage Processing Service (WCPS). Number 09-045. OGC, 2009. [7] M. Botts, A. Robin, J. Davidson, and I. Simonis, editors. Sensor Web Enablement Architecture. Number 06-021r1. OGC, 2006. [8] R. Cornacchia, S. Heman, M. Zukowski, A. de Vries, and P. Boncz. Flexible and efficient IR using array databases. Number Report INS-E0701. CWI, January 2007. [9] J. de la Beaujardiere, editor. OGC Web Map Service (WMS) Implementation Specification. Number 06-042. OGC, 2004-01-20. [10] A. Dehmel. A Compression Engine for Multidimensional Array Database Systems. Phd thesis, 2001. [11] E. Diedrich, B. Buckl, D. Dietrich, and P. Seifert. Www-based information retrieval from full resolution satellite images using a multi-dimensional data management system. In : Online proceedings of EOGEO Workshop 2001, http://eogeo.net, 27.06.2001. [12] P. Furtado and P. Baumann. Storage of multidimensional arrays based on arbitrary tiling. In Proceedings of the 15th International Conference on Data Engineering. IEEE Computer Society, 23-26

March 1999. [13] A. G. Gutierrez and P. Baumann. Modeling fundamental geo-raster operations with array algebra. In IEEE International Workshop in Spatial and Spatio-Temporal Data Mining, October 2007. [14] A. G. Gutierrez and P. Baumann. Computing aggregate queries in raster image databases using pre-aggregated data. In International Conference on Computer Science and Applications (ICCSA’08), 22-24 October, 2008. [15] K. Hahn, B. Reiner, G. Hoefling, and P. Baumann. Parallel query support for multidimensional data: Inter-object parallelism. September 2002. [16] W. H. Inmon. Building the Data Warehouse. Wiley, 1996. [17] ITT. www.rsinc.com/envi, last seen: 2009-apr-25. [18] C. Jucovschi. Precompiling Queries in a Raster Database System. Bachelor thesis, Jacobs University Bremen, 2008. [19] K. Kleese and P. Baumann. Intelligent support for high i/o requirements of leading edge scientific codes on high-end computing systems - the estedi project. In Proceedings of the Sixth European SGI/Cray MPP Workshop, 7-8 September 2000. [20] Leica Geosystems. gi.leica-geosystems.com/LGISub1x33x0.aspx, last seen: 2009-apr-25. [21] L. Libkin, R. Machlin, and L. Wong. A query language for multidimensional arrays: design, implementation and optimization techniques. In Proc. International Conference on Management of Data (SIGMOD’96), pages 228–239. [22] A. P. Marathe and K. Salem. Query processing techniques for arrays. The VLDB Journal, 11(1):68–91, 2002. [23] The Mathworks. www.mathworks.com, last seen: 2009-apr-25. [24] D. L. Moody and M. A. Kortink. From enterprise models to dimensional models: A methodology for data warehouse and data mart design. In M. Jeusfeld, H. Shu, M. Staudt, and G. Vossen, editors, Proc. International Workshop on Design and Management of Data Warehouses (DMDW 2000), June 5-6, 2000. [25] D. Nebert, A. Whiteside, and P. Vretanos, editors. Catalogue Service Implementation Specification. Number 07-006r1. OGC, 2007. [26] n.n. Geographic Information - Coverage Geometry and Functions. Number 19123:2005. ISO, 2005. [27] N.n. Abstract Specification Topic 6: Schema for coverage geometry and functions. Number 07-011. OGC, 2007. [28] n.n. rasdaman query language guide. rasdaman GmbH, 7.0 edition, 2008. [29] V. Panagiotis, editor. Web Feature Service (WFS) Implementation Specification. Number 04-094. OGC, 2005. [30] B. Reiner, K. Hahn, and G. H”ofling. Tertiary storage support for large-scale multidimensional array database management systems. In 28th International Conference on Very Large Data Bases (VLDB) 2002, 20.08.2002.

[31] R. Ritsch. Optimization and Evaluation of Array Queries in Database Management Systems. Phd thesis, 2002. [32] G. Ritter, J. Wilson, and J. Davidson. Image algebra: An overview. Computer Vision, Graphics, and Image Processing, 49(1):297–336, 1994. [33] C. Sapia. On modeling and predicting query behavior in olap systems. In Proceedings of the Intl. Workshop on Design and Management of Data Warehouses, DMDW’99, June 14-15, 1999. [34] P. Schut, editor. Web Processing Service Implementation Specification. Number 05-007r7. OGC, 2007-06-08. [35] S. Stancu-Mara. Using Graphic Cards for Accelerating Raster Database Query Processing. Bachelor thesis, Jacobs University Bremen, 2008. [36] G. Vowles, editor. Geospatial Digital Rights Management Reference Model. Number 06-004r3. OGC, 2004-01-20. [37] A. Whiteside, editor. OGC Web Services Common Specification. Number 06-121r3. OGC, 2007. [38] A. Whiteside, editor. Web Coverage Service (WCS) Transaction Operation Extension. Number 07-068r4. OGC, 2008. [39] A. Whiteside and J. Evans, editors. Web Coverage Service (WCS) Implementation Specification. Number 07-067r5. OGC, 2008. [40] N. Widmann and P. Baumann. Efficient execution of operations in a DBMS for multidimensional arrays. In Statistical and Scientific Database Management, pages 155–165, 1998.

Suggest Documents