Facilitate earth science data interoperability using the ... - Springer Link

Earth Sci Inform (2015) 8:711–719 DOI 10.1007/s12145-014-0189-8

SOFTWARE ARTICLE

Facilitate earth science data interoperability using the SCIDIP-ES data virtualisation toolkit Jinsongdi Yu & Peter Baumann & Shirley Crompton & Sheng Wu & Yimin Lu

Received: 16 May 2014 / Accepted: 23 October 2014 / Published online: 5 November 2014 # Springer-Verlag Berlin Heidelberg 2014

Abstract Ensuring long-term accessibility of Earth Science archive data is a recurrent issue for data centers. Heterogeneity of data adds particular challenges. The Data Virtualisation Toolkit (DVT) has been developed by the SCIDIP-ES project to support long-term access and use of heterogeneous Earth Science (ES) data in a format-independent manner. DVT provides four key functions: (a) edit format description, (b) interpret bit stream into data values using the format description, (c) construct legacy information from data values via coherent information models, and (d) visualize the retrieved legacy information if the information model is supported by the visualization component. The toolkit incorporates a tree structure editor, a bit stream interpretation engine, information models in XQuery, and a visualization component. DVT is designed to provide long term access to Earth Science data with the proper representation information which contains the technical knowledge required for interpretation. A Communicated by: H. A. Babaie Electronic supplementary material The online version of this article (doi:10.1007/s12145-014-0189-8) contains supplementary material, which is available to authorized users. J. Yu (*) : S. Wu : Y. Lu Fuzhou University, No. 2, Xueyuan RoadDaxue New District 350108 Fuzhou, China e-mail: [email protected] J. Yu : P. Baumann Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany S. Crompton STFC Daresbury Lab, Daresbury, Warrington Cheshire WA4 4 AD, UK

trial application using both vector and raster data shows that DVT could provide interoperable solutions to support the long-term preservation of ES data. This paper reports on the concept, development, and implementation of DVT. Keywords OAIS . Representation information . Data virtualization . Interoperability . OGC coverage

Introduction Multi-disciplinary Earth System Science often involves the use of emerging unfamiliar Earth Science (ES) archives which tend to be complex that it becomes difficult to process using on-hand or traditional data processing applications. ES data is heterogeneous in nature comprising diverse formats due to the varieties of data acquisition systems used to collect the measurements and fundamental differences in domain standards. Geographic Information Science (GIS) data models (Longley et al. 2010) are designed to capture, manipulate, analyze, manage, and represent all types of Earth phenomena, such as land use, elevation, trees, water, rainfall amount, etc. Normally, these models are categorized into vector and raster other than table format. Vector data explicitly include the location of each vertex in Coordinate Reference System, such as points, lines, and polygons; while raster data inexplicitly imply the cells’ geographic location via corresponding boundaries. Obviously, either points, lines, polygons or cells are serializable as a sequence of geometric objects which are similar with a set of tuples in a table. Each tuple can be regarded as a

712

key-value pair: a key is a geometic object, and the value is the associate attributes, such as temperature, salinity etc. However, when accessing these archived data, there is the added complexity of dealing with the format-specific bit location as well as structure differences introduced by diverse computing hardware on which the data was created. Furthermore, the pertaining archive specifications convey considerable complexity— including common and domain independent information models both within and between specifications. Due to continuous technology change and development, data access technologies are typically not standardized or lack support. Without adequate mitigation actions, we risk losing the rich legacy of historical Earth information amassed through generations of scientific endeavors. Data virtualization is a crucial process for ensuring continuous access to digital materials for the long-term (CCSDS 2012a) by turning the unfamiliar bit stream into accessible information. This study presents a Data Virtualization Toolkit (DVT) which is designed to ensure long-term access to heterogeneous ES data in a format-independent manner, regardless of Endianness of the hardware used. DVT is a component of the SCIence Data Infrastructure for Preservation of ES being implemented by the EU FP7 SCIDIP-ES project (Shaon et al. 2012). The SCIDIP-ES1 project aims to provide preservation einfrastructure services and toolkits which support the persistent storage, access and management needs, with specific focus on the Earth Science community disciplines including Earth Observation, climate change, oceanography, and atmospheric science and etc. By adding specific services and toolkits which implement core preservation concepts, the long-term access and exploitation of data assets across and beyond their designated communities is supported now and into the future (Shaon et al. 2012). The digital preservation of Earth Science data has received considerable attention during the past years due to global concerns with climate changes and this has led to the development of several Open Archival Information System (OAIS)based preservation projects. OAIS is an ISO reference model for long-term preservation of digital (and other) archives (CCSDS 2012a). There are several ongoing and potential collaborations between SCIDIP-ES and these projects (Shaon et al. 2012). For example, SCIDIP-ES has leveraged the SHAMAN2 work for addressing the 1 2

The SCIDIP-ES project - http://www.scidip-es.eu The SHAMAN project - http://shaman-ip.eu/shaman/

Earth Sci Inform (2015) 8:711–719

scalability and other related issues associated with the Process Virtualization and Emulation, and Packaging Toolkit. SCIDIP-ES has also collaborated with the APARSEN3 project which is developing an interoperable framework for persistent identifier services. The remainder of this paper is organized as follows: Section 2 gives the state of the art. Section 3 presents the concept and design of DVT. Section 4 gives the applications of common used data formats, a detailed walk through using NetCDF4 data and a discussion of the extra performance overhead of DVT interpretation. Section 5 summarizes the work and impact to date.

State of the Art Emerging Data Description technologies and ES standards are on the table as input to address the interoperability issue on bit level. For example, Data Description Record (DDR) adopts a standardized approach to specify data description according to EAST (Enhanced Ada SubseT) language (DEBAT 2003), which is an ISO and CCSDS standard language. Data Format Description Language (DFDL) describes mappings between original data and a tree representation in XML (Powell et al. 2011). Data Request Broker (GAEL 2008) (DRB) adopts a tree model, called Structured Data File (SDF), to deal with binary data. These technologies facilitate the extraction of data values from archives (CCSDS 2012b), for example using the DRB to interpret the ENVISAT5 data and extract relevant information, regardless of physical formats (Mbaye 2004). These technologies are focus on structural level without dealing with ES data models. Hence, further development is needed to integrate ES knowledge before these technologies can be applied in ES data interpretation. ISO 19100 is a series of international standards for information related phenomena that are directly or indirectly associated with a location relative to the Earth. The ES phenomena is denoted as a function f: D→R, where D denotes the location and R denotes the recorded attributes. Currently, ISO together with Open Geospatial Consortium (OGC) are leading support of the development of both abstract and implementation ES standards to support the interoperability of spatial information infrastructures. For example, ISO 19123 is abstract in the 3

The APARSEN Project - http://www.alliancepermanentaccess.org/ index.php/aparsen/ 4 http://www.unidata.ucar.edu/netcdf 5 https://earth.esa.int/web/guest/missions/esa-operational-eo-missions/ envisat


sense that many implementations are possible (ISO 19123 2005). A coverage specification, the OGC GML 3.2.1 Application Schema—Coverages standard (GMLCOV) (Baumann 2012) has been established based on the abstract model. It is concrete in terms of basic building blocks as XML and integrity constraints as defined in the corresponding schema and Schematron. Therefore, the model is machinereadable (OMB 2014) and allows cross inter-operation by multiple technology implementations.

713

&

Main concepts & This section describes the approaches and models used in DVT for data abstraction. DVT uses the concept of the OAIS Representation Information to provide support for exploring the structure and contents of a digital data object. Representation Information is defined in OAIS Reference Model as the additional information that maps a Data Object into more meaningful concepts (CCSDS 2012a; Giaretta and Rankin 2011). OAIS categories Representation Information into three main types: Structure, Semantic or Other. Structural Representation Information describes the format or data structure concepts expressed as mapping rules for mapping the bits sequences into data types (basic building blocks) and then to higher level concepts needed to understand the digital data object. Normally, data objects in computers are represented as sequences of bits (alternatives 0 and 1). Usually, bits are grouped together to encode and represent data values in some data types, specifically, primitive data types (PDT), which are basic building blocks handled by programming languages (Giaretta and Rankin 2011). Mathematically, the mapping can be represented {d→v}, where d is a sequence of 0/1 and v is an instance of one of PDTs. To interpret 0/1 sequences, the following physical data structures are considered: &

&

Enumeration (extensional definition): ideally, the order of PDTs is enumerated sequentially. Consequentially, data instances can be extracted according this enumeration. However, it is clumsy to enumerate a list of same PDTs for intensionally defined data, e.g. list or array. Intensional definition: there are cases where the same PDT is repeated in a sequence. Normally, the definition can be denoted as a 3-tuple (s, t, n), where s denotes starting location of the sequence, t denotes PDT, and n denotes the occurrence number. Sometime, end marker is used for the sequence. In this case, the definition can be denoted as an 4-tuple (s, t, p, m), where s denotes starting location of the sequence, t denotes PDT, p

denotes padding and m denotes the end marker. Padding will be needed to denote unknown addresses when divisibility is not available between the sequence length and the PDT length. Functional definition: a file may provide some selfdescription in its bit stream, e.g. NetCDF data. Some of these descriptive items set the structure of the following PDTs. This definition is denoted as a function f: i→o, where i denotes the input description and o denotes the output description. E.g. NetCDF data provides its variable blob description in its header, which includes the blob size and date type. Hierarchy: there are both individual data values and values groups in a file. If data values are treated as leaves of a tree and groups of data values are treated as branches of the tree. The physical data structure can be seen as a tree, in which case either leaves or branches can be easily located with XPath or manipulated with XQuery FLWOR expression.

The additional ES models provide geospatial information about PDTs by attaching definitions, attributes, comments, descriptions, etc. In an OAIS-conformant archive, when a target community user access unfamiliar data, the archive will provide a pre-defined series of Representation Information to help the user interpret the data. However, different communities typically have their own preferred domain information models resulting in diverse Representation Information being used to help interpret the same PDTs kept in different archives. Therefore, the interpreted results tend to be heterogeneous and a common understanding is not available. In ES, standardization on geospatial information is available on both abstract and implementation levels. Abstract models are not recommended for direct usage due to the fact that many implementations are possible for the same abstract model. Therefore, implementation standards (e.g. OGC GML, GMLCOV) which have been accepted as the community best practice are used to help provide a common understanding of Earth phenomena and the contents in corresponding archives. These models are ubiquitous for building ES Representation Information that can be used to achieve interoperability in long term access.

Design of the toolkit Accordingly, DVT is designed based on bit stream interpretation Application Programming Interface (API), e.g. DRB API is used in this study, to provide a software abstraction layer that helps developers to abstract data format specific concerns from their

714


Fig. 1 DVT data flow diagram

analysis or data access applications. Abstraction is achieved using the physical data object model and standardized ES models. DVT is implemented as a Graphical User Interface (GUI) tool that offers four main components to support: & & & &

The edit of representation information, Interpretation of bit stream according to the Structure Representation Information associated with the digital data object, Construction of legacy information from data values via standardized information models, and Visualizuation of the mapping component output.

Figure 1 shows the input data and the results obtained from each component in a data flow diagram. The structural representation editing component (SREC) performs create, delete, update and insert operations on an ordered tree structure. The result is a structural description which describes the physical data structure. The bit stream interpretation component (BSIC) takes original digital data from archives as input and interpret the bit stream according to the given structure representation. The result is an ordered tree instance of the given representation model. The mapping component (MC) takes the ordered tree instance as input and constructs legacy information from data values according to mapping rules of the given standardized

information model. The result is an instance of the given model. The visualization component (VC) takes the mapping component output and visualizes the output if the data format is supported by the visualization component. This study designs a data virtualization workflow pattern, including Creation of representation information, Interpretation of bit stream, Mapping to standardized ES models and Visualization. This pattern divides a virtualization workflow into four interconnected parts, so as to separate internal interpretation of information from the ways that the information is presented to or input by the user. We call this workflow pattern CIMV. Four processes are conducted in a loosely coupled way. Restricted conditions are applied on both the input and output of each process. Only components which conform to the conditions on the output and input matching can be concatenated to apply to data virtualization applications. The naturalization of the DVT interpretation, mapping and visualization enables researchers to use unfamiliar ES achieves in terms of these results if the OAIS concepts hold. The demonstration workflow in this study dynamically linked four main software components as an implementation of the CIMV pattern. Specifically, these are JAVA Swing JTextArea6 for editing representation 6

http://docs.oracle.com/javase/7/docs/api/javax/swing/JTextArea.html, accessed 15 April 2014


information, DRB for interpretation of bit stream (GAEL 2008), saxon7 for XQuery FLWOR expression and Topcat for tabular CSV data visualization (Taylor 2013). The toolkit8 is delivered as a standalone desk top application with a uniform front-end on the specified components.

Use cases Two study case are used to illustrate how DVT facilitates interoperability and allows long-term access to ES raster and vector data. The CIMV workflow is demonstrated for the retrieval and navigation of both raster and vector archives. Generally, the approach allows to accessing heterogeneous ES data which are coherent with GMLCOV (Baumann 2012), such as GridCoveage, MultiPoint, MultiCurve, and MultiSurface models. The approach is a generic solution which can also be adapted for both raster and vector data. Raster or vector data which are coherent with GMLCOV follows the same process flow using the appropriate Representation Information and interpretation. We will illustrate the vector and raster cases separately. For example, an ESRI shapefile .shp file which illustrates site contours of Villa of Livia (Giaretta 2011) can be interpreted as point lists in

715

XML, with each contour containing a list of points. The first and end points of each list are the same. An interpreted contour is as below: 292391.71770954644 4653680.343077232 292370.5126930594 4653527.011624375 292285.6916444991 4653300.276680467 292285.6916735157 4653288.858384792 292391.71770954644 4653680.343077232

Then, the point lists can be turned into MultiSurface in GML (Portele 2007) by XQuery FLWOR expression. The corresponding MultiSurface is as below:

292391.71770954644 4653680.343077232 292370.5126930594 4653527.011624375 292285.6916444991 4653300.276680467 292285.6916735157 4653288.858384792... 292391.71770954644 4653680.343077232

The toolkit not only facilitates accessing vector datasets, but also raster data in a format-independent manner if the appropriate representation information is provided. The Natural Environment Research Council (NERC) Mesosphere-Stratosphere-Troposphere (MST) Radar9 is 46.5 MHz pulsed Doppler radar serves for studies of the atmospheric winds, waves and turbulence. These radars give continuous measurements of the three dimensional wind vector and additional information about humidity, static stability. 7

http://saxon.sourceforge.net/, accessed 15 April 2014 http://int-platform.scidip-es.eu/joomla/, accessed 15 April 2014 9 http://mst.nerc.ac.uk/nerc_mst_radar.html 8

The archive is crucial for studying the trend of atmospheric turbulence. The MST is archived in network Common Data Form (NetCDF) format which is an open, self-describing, machine-independent, and array-oriented scientific data standard. Mathematically, the NetCDF model is a function which can be mapped to the ISO coverage model (Davis et al. 2008 and Nativi et al. 2008) and encoded as a coverage (Domenico and Nativi 2011) to improve the interoperability of existing archives. Data Virtualizaiton is considered as a promising solution to achieve the long-term goal of accessing to Earth archives across multiple disciplines (Yu et al. 2014). DVT is used to illustrate how to facilitate interoperability and allow long-term access to NetCDF data with proper

716

Representation Information. Figure 2 shows the structural representation information of NetCDF format in DRB description. The description is written in XML schema and ordered in a tree where all elements are clearly identified from each other, including NetCDF magic code (Unidata 2014), dimensions, variables and etc. Figure 3 shows the business process flow diagram of the NetCDF use case. DVT can be used to create the DRB description if it does not exist. The NetCDF data can be interpreted into XML document according to the DRB description. Then, this XML document can be converted into a CSV table or a grid coverage in GMLCOV by XQuery. The CSV result can be visualized by TOPCAT for better interpretation and GMLCOV can be used to improve Earth data access. In this use case, the extensional way results in a CSV which is a list of tuples as (height, time, water_vapour), see Appendix A. The extensional representation of the function is the enumeration of each (location, attributes) pair. As each location consists of a list of coordinates, one per dimension, the location can be represented as a tuple of coordinates as (coord1, coord2, …) . The attributes can be represented as a tuple of attributes as (attrib1, attrib2, …). Therefore, each pair is denoted as a tuple (coord1, coord2, …, attrib1, attrib2, …) and the result is a sequence of such tuples as. The tabular result is

Fig. 2 DRB NetCDF format description


visualized by TOPCAT (Taylor 2013), see Fig. 4. In this visualization, the height and time are axes of 2D dot map, and each dot is labeled with a water vapor value. The extensional way results in a grid coverage, see Appendix B. A grid coverage is an intensional representation of the function (see ISO 19123 2005). Its spatial, temporal or spatiotemporal extent is a pair of lower and higher corner lists. Its attribute values, which are a sequence of attribute tuples are associated with coverage locations according to a grid function. This function includes a start point that is mapped to the first attribute value and a sequence rule that describes how the grid points are ordered for association to the elements of the sequence values. This sequence rule is represented by an axis order that provides the increment order of the grid axis. Each point in grid coverage has a corresponding tuple in the table.

Cost and generality discussion Although DVT facilitates interoperability and allow long-term access to NetCDF data, the interpretation costs extra performance overhead. Thus, the DVT is not recommended to replace standard NetCDF tools, but as a supplement tool which can be used to facilitate interoperability by mapping to GMLCOV. Either enumeration, intensional, function, or hierarchy definition is adopted, the original bit stream needs to be interpreted according to t he given representation information. In this study, PDTs are interpreted as XML elements which require extra input/output (I/O) on parsing the XML tags. To illustrate the I/O overhead, we use two different structural representation information objects to interpret intensional arrays. An array can be either intensionally interpreted as an element with a list of values or extensionally as a list of value elements with separate tags. These extra tags increase I/O significantly. We have written a test script in Unix shell (bash) to benchmark the cost by giving the two different structural representation information files to interpret the NCAR NetCDF test data,10 (see Table 1). The intensional approach interprets a NetCDF variable into a list of values in one element and the extensional one interprets a NetCDF variable into a list of value elements. Time1 shows the performance of intensive approach and Time2 shows the performance of extensive approach. Line2 shows that the extensional approach creates a lot of extra XML tags and degrades performance as the lines increase. Hence, the extra expense heavily depends on the quality of representation information. This impacts operational systems response time when DVT is used to mediate access, particularly if under-scaled to cope with unanticipated structure knowledge or not tuned. 10 www.unidata.ucar.edu/software/netcdf/examples/files.html, last seen on May 05th, 2014


Fig. 3 Process flow of the NetCDF use case

Fig. 4 2D visualization with TOPCAT

717

718 Table 1 virtualization benchmark


Data

Size (MB)

Time1 (s)

Line1

Time2 (s)

Line2

sgpsondewnpnC1

0.178

2.93

1584

2.85

1584

ncswp_SPOL_RHI_

0.817

3.5

3730

6.855

18895

ncswp_SPOL_PPI_ testrh

1.921 0.04

3.663 2.419

3730 139

7.049 9.962

18895 30080

tos_O1_2001-2002

2.881

2.771

875

19.828

95781

GLASS

0.29

4.494

2991

36.579

213015

rhum.2003 GOTEX.C130_N130AR.LRT.RF06.PNI

59.957 8.067

2.646 6.52

815 28233

31.727 46.559

253726 294513

ECMWF_ERA-40_subset sresa1b_ncar_ccsm3_0_run1_200001

21.646 2.704

3.526 4.285

2109 1492

61.96 309.413

538835 2069344

slim_100897_198

5.228

5.482

1103

301.644

2427494

t19981111_0045 HRDL_iop12_19991027024421

1.054 4.411

2.839 5.196

507 1039

210.748 402.642

3232365 3386438

Conclusions and outlook By means of data virtualization, DVT abstracts the bit stream detail of archived Earth data, thereby making different data sources accessible from a unified data access point and allows users to inspect the content and structure of a digital data object using Representation Information, e.g., viewing a NetCDF-based file in tabulated format without a dedicated NetCDF viewer or transforming the data to a standardized ES model. DVT is distributed by SCIDIP-ES as a stand-alone toolkit making up its preservation e-infrastructure. This einfrastructure also includes a restful registry service as a repository for Representation Information (Shaon et al. 2012). GENESI-DEC which is an Earth Observation Catalog of the portal of the Earth Observation System of Systems (GEOSS) has initiated a collaboration with SCIDIP-ES to deploy a representation information registry to facilitate access to data (Cossu 2012). More improvement can be made to enhance the DVT. For example, there will always be some gaps between given representation information and desired representation information as the targeted community changes. By computing intelligibility gaps between given representation dependency models, the toolkit shall be able to identifies gaps and user can create the required Representation Information to plug in the gap. Moreover, a client always needs to retrieve both the archived data and its representation information before it can interpret the data correctly. An interaction with OGC GMLCOV model to facilitate both data and representation information retrieval has been proposed (Yu et al. 2014) as a potential interoperation solution. GMLCOV is open to Uniform Resource Locater (URL) links, e.g. by using representation information resources in restful protocol. Web Coverage Service (WCS) 2.0 provides access to coverage data in a scalable

core and extension mechanism (OGC policy SWG 2009). The core establishes basic coverage retrieval operations based on unified coverage model. A set of extensions are set up to define additional functionality, e.g. Keyvalue Pair (KVP) extension (Baumann 2013) which expand server capabilities to meet specific interaction requirements. It should be noted that DVT is not recommended to replace legacy data access technologies as it impacts operational system response time when it is used as mediation tool, particularly if the structure representation information is not finely tuned to prevent the performance degradation. However, performance may be improved by either scale up or scale out. As a single node cannot be expanded beyond a certain limit, scale out may be a better solution. By adding more nodes and it shall be able to provide an easy and virtually limitless way of expanding DVT capability in a distributed environment, such as a cloud infrastructure. In summary, the DVT can be finely tuned to address the goal of seamlessly accessing multidisciplinary archives, which will help existing and future researchers to navigate these archives. Acknowledgments The work carried out in this study was supported by the SCIDIP-ES and EarthServer projects which are funded by the Seventh Framework Programme (FP7) of the European Commission (EC) under Grant Agreements 283401 s and 283610 respectively, by NSFC (Project 41401454) and Natural Science Foundation of Fujian Province of China (No.2014 J05048).

References CCSDS (2012a) Reference Model for an Open Archival Information System (OAIS), CCSDS 650.0-M-2, http://public.ccsds.org/ publications/archive/650x0m2.pdf, Accessed 26 August 2013

Earth Sci Inform (2015) 8:711–719 GAEL (2008) Data Request Broker DRB API, http://www.gael.fr/drb/ doc/GAEL-P243-DOC-001-01-01_DRB_API_handbook.pdf, Accessed 26 August 2013 DEBAT (2003) DEBAT: Development of EAST Based Access Tools Interface Control Document, SS/DEBAT/ICD, http://debat.c-s.fr/ east/Interface Control Document [SS-DEBAT-ICD 2.0].pdf, Accessed 26 August 2013 Powell AW, Beckerle MJ, Hanson SM (2011) Data Format Description Language (DFDL) v1.0 Specification, http://www.ogf.org/ documents/GFD.174.pdf, Accessed 26 August 2013 Giaretta D, Rankin S (2011) Understanding a digital ObjectDigital object: basic representation information, in: advanced digital preservation. Springer, Berlin Heidelberg, pp 69–138 Mbaye S (2004) Use of XML Schema and XML Query for ENVISAT product data handling, http://citeseerx.ist.psu.edu/viewdoc/ download?rep=rep1&type=pdf&doi=10.1.1.136.4052, Accessed 26 August 2013 CCSDS (2012b) XML Formatted Data Unit (XFDU) Structure and Construction Rules, CCSDS 661.0-B-1, http://public.ccsds.org/ publications/archive/661x0b1.pdf, Accessed 26 August 2013 ISO 19123 (2005) Geographic information – Schema for coverage geometry and functions, ISO 19123, http://www.iso.org/iso/home/ store/catalogue_tc/catalogue_detail.htm?csnumber=40121, Accessed 26 August 2013 Baumann P (ed.) (2012) OGC ® GML Application Schema – Coverages, OGC 09-146r2, Version 1.0.1, https://portal.opengeospatial.org/ files/?artifact_id=48553, Accessed 26 August 2013 Taylor M (2013) TOPCAT - Tool for OPerations on Catalogues And Tables Version 4.0-1. www.star.bris.ac.uk/~mbt/topcat/sun253.pdf, Accessed 26 August 2013 Longley PA, Goodchild MF, Maguire DJ, Rhind DW (2010) Geographic Information Systems and Science, 3rd ed., Wiley-Blackwell Giaretta D (2011) Cultural heritage testbed, in: advanced digital preservation. Springer, Berlin Heidelberg, pp 387–406 Portele C (ed.) (2007) OpenGIS® Geography Markup Language (GML) Encoding Standard, OGC 07–036, Version 3.2.1, http://portal. opengeospatial.org/files/?artifact_id=20509, Accessed 26 August 2013

719 Shaon A, Giaretta D, Conway E, Matthews B, Yu J, Marelli F, Guarino R, Giammatteo U, Marketakis Y, Marketakis T, Brocks H and Engel F (2012) Towards a Long-term Preservation Infrastructure for Earth Science Data, In: Proceedings of the 9th International Conference on Preservation of Digital Objects (iPRES2012), October 1–5, 2012, Toronto, Canada Baumann P (ed.) (2013) OGC® Web Coverage Service 2.0 Interface Standard - KVP Protocol Binding Extension - Corrigendu, OGC 09-147r1, Version 1.0.1, https://portal.opengeospatial.org/files/09147r1, Accessed 15 April 2014 OGC policy SWG (2009) The Specification Model — A Standard for Modular specifications, OGC 08-131r3, Version 1.0, October 2009, https://portal.opengeospatial.org/files/?artifact_id=34762, Accessed 15 April 2014 Yu J, Baumann P, Misev D, Campalani P, Albani M, Marelli F, Giaretta D, Crompton S (2014) Point of view: long-term access to earth archives across multiple disciplines. Comput Stand Interfaces 36(6):909–917 Cossu R (2012) GENESI-DEC and SCIDIP-ES, http://portal.genesi-dec. eu/presentations/GENESI-DEC-SCIDIP-ES.pdf, Accessed 15 April 2014 Nativi S, Caron J, Domenico B, Bigagli L (2008) Unidata’s common data model mapping to the ISO 19123 data model. Earth Sci Inform 1(2): 59–78 Domenico B, Nativi S (Eds.) (2011) WCS 2.0 Interface Standard – NetCDF Encoding Format Extension, OGC 11–010, Version 2, http://www.unidata.ucar.edu/ mailing_lists/archives/galeon/2011/ pdf_zj1zXL2oW.pdf, Accessed 15 April 2014 Davis ER, Caron J, Domenico B (2008) Grids and beyond: netCDF-CF and ISO/OGC Features and Coverages, AMS 2008, http://www. unidata.ucar.edu/staff/edavis/PapersPresentations/2008-01-AMSGridPlus/2008-01-AMS-Paper.pdf, Accessed 15 April 2014 OMB Circular A-11, Part 6 (2014) Preparation and Submission of Strategic Plans, Annual Performance Plans, and Annual Program Performance Reports, http://www.whitehouse.gov/sites/default/ files/omb/assets/a11_current_year/s200.pdf, Accessed 15 Auguest 2014 Unidata (2014) The NetCDF Classic Format Specification, http://www. unidata.ucar.edu/software/netcdf/docs/html_guide/file_format_ specifications.html, Accessed 15 September 2014