SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
WISE Technology A Scientific Information System for Astronomy and Beyond A. Belikov
D. Boxhoorn
K. Begeman
Kapteyn Astronomical Institute University of Groningen Groningen, the Netherlands
[email protected]
Kapteyn Astronomical Institute University of Groningen Groningen, the Netherlands
[email protected]
Kapteyn Astronomical Institute University of Groningen Groningen, the Netherlands
[email protected]
E. Valentijn
W.-J. Vriend
O.R.Williams
Kapteyn Astronomical Institute University of Groningen Groningen, the Netherlands
[email protected]
Kapteyn Astronomical Institute University of Groningen Groningen, the Netherlands
[email protected]
Donald Smits Centre for Information Technology University of Groningen Groningen, Netherlands
[email protected]
Abstract—The data processing of a number of current astronomical projects require an intelligent data handling system which can satisfy the requirements both from users processing the data and users exploring the result. We present the WISE Concept of Scientific Information Systems which has been used in a number of data processing systems in Astronomy. In this paper we review its origin and principal components. We discuss particularly the new developments which allow WISE based systems to be deployed with less effort to a greater variety of projects. Keywords—data storage; data processing; grid; information system
I.
INTRODUCTION
A typical scientific information system is not very challenging from the point of view of design and architecture, as it is almost always oriented toward delivery of the scienceready data to the end user. The data processing is separated from the information system and performed outside of it, prior to the publishing of the data set in the information system. As a result the data processing and data exploration are entirely separated and the end user normally works with a complete and consistent data set. Such a traditional approach works well if the data processing is done by a small group of scientists which can share resources (storage and processing facilities) and keep expertise (coding and quality assessment) within this group. Additionally, the end user should be satisfied with working with processed data only and not wish to request any action on the data which requires reprocessing. The traditional approach to scientific data processing implies that after the delivery of the data product of a mission or survey, users should reprocess the data themselves if they are dissatisfied with the data quality or have a particular use case which is not accounted for in the ``mainstream'' data processing. Due to the increased volume and complexity of data, modern scientific data processing cannot rely on such an approach. The WISE concept of scientific information systems originated from a solution for astronomical survey processing
which addressed the problems described above and created systems suitable for both the almost unlimited number of scientific use cases and the continuous reprocessing of the data. The WISE Concept is a result of experience gained in OmegaCEN (a Dutch national astronomical data center) hosted by the Kapteyn Astronomical Institute of the University of Groningen. WISE Technology is a collection of architectural and software solutions developed as a result of the implementation of WISE Concept. The WISE concept originates from the data handling and processing system developed for the Kilo Degree Survey (KiDS), which is called the Astronomical Wide-field Imaging System for Europe (Astro-WISE). KiDS is a European Southern Observatory public 4-band survey which is currently being performed on the VLT Survey Telescope at the Paranal Observatory, Chile [3]. The resources for the processing and storage of KiDS data were supplied by a number of astronomical institutes across Europe, which required the development of a distributed system with transparent exchange of information between different sites. Even more important was the requirement to exchange expertise in the astronomical image processing. As a result Astro-WISE was designed and developed as a complete information system which provides the storage, processing, data validation, assessment and control of the data quality, automatic re-processing triggered by requests generated by the system or users and finally publishing of the science-ready data to the wider astronomical community. From the user point of view, such a system can perform all data processing from the raw images to a final catalog of objects using a single entry point. From the system point of view, no information about the data processing and origin of the data is lost by the system. Astro-WISE is a multi-survey system, which means that it can store and process the data for a number of projects and surveys. Astro-WISE implements a general optical pipeline for wide-field images, which allows it to accommodate and process data from a wide range of instruments including: the Wide Field Imager at the 2.2m telescope at La Silla; the Wide
912 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE
SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
Field Camera at the Isaac Newton Telescope, La Palma; Suprime CAM at the Subaru Telescope, Mauna Kea; the Advanced Camera for Surveys onboard the Hubble Space Telescope and others. The infrastructural challenges which Astro-WISE had to address include the necessity to support data processing and data storage grids. It also satisfies the need for a multi-user system with the ability to store data in private user spaces and at the same time to share selected data items within a group of users. Apart from Astro-WISE [1], the WISE Concept and WISE Technology was used to create the Lofar Long-Term Archive [2], the MUSE processing system, Euclid Archive System prototype and other systems for scientific data analysis. In this paper, the next section contains an analysis of the requirements for a scientific information system capable of supporting data processing. This is followed by a specification of a data-centric scientific information system. Then the components of such an information system are reviewed and its advantages discussed. Finally, the enhancements performed over the last five years of development are described: these allow WISE based systems to be deployed with less effort to a greater variety of projects. II.
THE REQUIREMENTS ON THE SYSTEM
The design of any information system starts with the analysis of requirements generated by potential users of the system. As described in the introduction there are two groups of users with different type of requirements to the system: end users who are interested in the scientific exploitation of the data and the processing team who is responsible for the quality of the final data products. The latter group is particularly interested in the ability to control data processing from the raw data to the final data products and assess the quality of the produced data at each step of the processing. Furthermore, this group has a requirement to make a system adaptable to new storage and processing solutions, followed by the ultimate goal of making it as cheap as possible. The former group is interested in a variety of use cases which require a flexible interface which supports user-specified queries. Additional requirements come from the need for long-term sustainability of the data processing, since the data assimilation stage may take many years in a scientific project. In the case of big projects, involvement of a number of institutions and distributed expertise on the field demands the distributed nature of the system itself. We can summarize top-level requirements on scientific information system as following: 1) Scalability of the system. Any part of the system (e.g. data storage, data processing, metadata storage, interfaces and services) should be scalable, to allow an increase of the amount of incoming data or the number of users involved in the data processing. On top of the infrastructural scalability the system should be scalable with respect to the data processing algorithms and pipelines: allowing the implementation of new pipelines and the derivation of improved results from the same raw or intermediate data with new algorithms. Scalability of data mining should also be
possible: the system should satisfy many possible kinds of requests, from the retrieval of a single data item specified by an identifier to a complicated study involving multiple complex queries. 2) Distributed nature of the system. The requirements on distributed nature of the system applies not only to the infrastructure, which can be distributed over a number of institutions, but also to the requirement for distributed access to the system. The latter means that the derivation of a result should be possible at any site where the system is deployed. 3) Traceability. This is not limited to dependencies between data products. Instead, all activity in the system should leave a clear trail, so that it will be possible to trace the origin of any changes in the data and to find the algorithm, program and user who created a data item. This allows the sharing of knowledge amongst all users of the system and shows an end user exactly how the data product was produced. 4) Reproducibility. It should be possible to reproduce each data product with the same characteristics it was originally created by the system. The reproducibility from the input (raw) data can be used to the advantage of the system, since it allows the removal of intermediate data products. 5) Adaptability. It should be possible to adopt the system for a number of different scientific use-cases: providing resources, pipelines and expertise to perform data processing according to users varied interests in the same data set. The requirement to trace any changes of the data product during processing, including input and output for this data product and the requirement to be able to reproduce the data product, put enormous stress on the data model used by the system as a way to implement these features. Consequently, the data model itself becomes the core of the system with a specific set of requirements on it: 1) The data model must provide complete data provenance, i.e. both data ingested into the system and data created inside the system should be described with a sufficient level of detail to be able to reproduced entirely any products or results; 6) The data model must provide full data lineage, i.e., each data product and literally each bit of information (in the case of astronomical image) must be traceable back to the origin and creator of the information; 7) The data model must be as tolerant as possible to the changes in the data processing, i.e. changes in pipelines which are incorporated into the system; 8) The data model must support both data reprocessing and data reproduction, allowing the user to employ the information system to refine pipelines. This is required because the data processing in the scientific information system described in this paper is an iterative process, which often requires changes in the pipelines themselves The set of requirements on the infrastructure reinforce the top-level requirements and can depend on the resources available for the particular information system, but in the
913 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE
SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
general case they should include an ability to share hardware resources between sites (both data processing and data storage). The information system should avoid a single centralized element which can become a bottleneck for the whole system and the system should be redundant to failures at single sites. Requirements on user interfaces and services are also system-specific, but these interfaces and services should provide the user with ability to trace data products, reprocess them by request and perform extensive data mining with visualization of results. On review of the requirements described above and the dependencies between data products and the stress they put on the data model, it was concluded that the data processing system being developed for KiDS data should switch paradigm. It was decided to move from a processing-centric, system which has been typical for astronomy, to a data-centric system III.
A DATA-CENTRIC APPROACH TO DATA PROCESSING
The traditional approach to data processing in science is processing-centric. In this approach the pipeline is in the center of the data processing and the archive is used only for the storage and distribution of the raw (input) data and the final data products. Two core requirements of a scientific information system – to provide traceability and reproducibility of data products - can be achieved by a re-run of the pipeline from the raw data. Data-centric processing places the data, the data model and operations with the data in the center of the system: monitoring each data item and linking each data product to the data product set (used to produce it), the processing parameters employed in the processing and the user who processed it. The main feature of an information system built on a datacentric approach is the potential for the automatic organization and enrichment with new data. The data-centric approach implies both detailed modeling and an awareness of how things change in time. Important changes for each particular data item in the system and for the system as a whole range from: 1) New data entering the system, which includes ingestion of new data products or physical changes (e.g. the gain of a sensor) effecting the configuration of the system 2) Modification in the source code to reflect advancements of human understanding of the physical changes 3) Changes in our model of the world (e.g. the cause and modeling of the gain variations of the sensor). The system which is be able to cope with all these changes can be used, not only to accumulate and process the data from the instrument, but also to calibrate the instrument itself. The use of better calibration will gradually improve the quality of the data processing and the final data product. The ideal information system would seamlessly cope with all these changes in time, thus creating a living environment, for long term digital preservation. To our knowledge, AstroWISE was one of the first systems attempting to reach this goal.
IV.
THE WISE CONCEPT
An information system usually consists of three components: a data layer, business rules and the interfaces. To implement a data-centric approach in Astro-WISE the data layer was separated into two parts: the pure measurement data layer hereafter called the data layer (or data files), and everything beyond that, hereafter called the metadata layer. Metadata ranges from file sizes, to statistics of pixel values and detected events in the measurements. The data layer keeps the bulk of the data unchanged in data files (after ingestion changes to files are not permitted). Meanwhile, all the information updated or created during the data processing or the evaluation of the quality or due to the change of access privileges to the data is kept within the metadata layer. The metadata layer supports the implementation of a data model, while the data layer supports the storage the data as files in a standard format (FITS in the case of Astro-WISE). Business rules, implemented in Python classes, bind the metadata and the data layers. Interfaces provide user access to both the business rules and the data. In the case of Astro-WISE the business rules are present in: 1) The metadata layer (expressed in the Data Definition Language used to create a data model). 2) The data layer (with on-the-fly compression of the data on data storage nodes). 3) A number of pipelines and programs which the user defines to process the data. This last part of the business rules components we will call the data processing layer and are the most apparent to the user. To implement Astro-WISE we use abstractions of storage, processing and database capabilities as a basis for the infrastructure for each of the layers of the system. Separation of these three infrastructures plays a key role in the flexibility of the system: 1) The metadata layer is realized in a relational DBMS through an abstraction of the required database functionality. 2) The data layer is put on the distributed Astro-WISE data storage nodes through an abstraction of the required storage functionality. 3) The processing layer is used to connect the user and the data and metadata layers, by a number of interfaces, to the separate data processing facilities (high performance clusters or grid machines). The metadata layer implements a list of functionalities to satisfy the requirements on the system: • Inheritance of data objects. Using object-oriented programming, all objects within the system can inherit key properties of the parent object. All these properties are made persistent. • Full lineage. The linking (associations or references, or joins) between object instances in the database is maintained completely. Each data item in the system can be traced back to its origin. The tracing of the data object can be both forward and backward.
914 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE
SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
• Consistency. At each processing step, all processing parameters and the inputs which are used are kept within the system. • Massive scale parallel and distributed processing. The administration of asynchronous processing is recorded in the metadata layer in a natural way. The requirement to have a scalable and distributed system is implemented by the distribution of all components and the support of multiple users. These features propagate as key principles throughout the realization of metadata, data layers and business rules which form the core of the WISE approach: 1) Component based software engineering (CBSE). This is a modular approach to software development; each module can be developed independently and wrapped in the base language of the system (Python) to form a pipeline or workflow. 2) An object-oriented common data model used throughout the system. This means that each module, application and pipeline will deal with the unified data model for the whole cycle of data processing from the raw data to the final data product. 3) Persistence of all the data model objects. Each data product in the data processing chain is described as an object of a certain class and saved in the archive of the specific project along with the parameters used for data processing. The Astro-WISE system, together with the other systems based on the WISE Concept, created subsequently, is implemented using the Python programming language. It also allows any external program to be wrapped into a Python module, library or class. The use of Python provides the combination of the principles of modular programming with object-oriented programming, so that each package in the system can be built and run independently with an objectoriented data model serving as glue between modules. At the same time, the logic behind pipelines and workflows in AstroWISE allows the execution of any part of the processing chain independently from the other parts. V.
pipeline. In the systems created based on WISE Concept it became possible to specify the data model using XML or XSD. The user is able to describe a data model with an Objectoriented approach and interact with the data based on this description, while leaving it to the system to implement the data model. Objects constructed according to the model will produce metadata for the data items in the information system. These objects keep information about the origin of data products they describe, the pipelines used to produce them, input parameters and other data products. Such information in the metadata can be used to prevent unnecessary reprocessing or as a source for quality control or for the planning of the next processing steps. The metadata can also be used for the implementation of access right controls which can allow users to share data or to block them from unauthorized access.
DATA MODEL IMPLEMENTATION
Fig. 1. Data model propagation in the Astro-WISE system [1]
As discussed above, the data model has a key role in the building of a scientific information system. In the original Astro-WISE system the data model was introduced directly in Python classes which formed the core of the data processing
Fig. 2. RawFITS definition in XSD
Figure 1 illustrates the implementation of the data model, defined by pipelines, in Python classes and a relational
915 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE
SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
database schema. At the beginning the designer creates a description of the data model with the help of Python classes or some higher abstraction level language (XSD, for example). The latter is converted to Python classes which are based on a few parent classes interfacing the metadata database and storage facilities. Any attribute of the Python class can be declared persistent meaning that the corresponding structure for this attribute will be created in the metadata database. The collection of all Python classes creates a metadata database scheme which can then be used in a WISE information system. This scheme preserves not only the data but all dependencies created by the designer of the original Object-Oriented data model. The user does not access the metadata database directly, but instead operates with objects implemented by Python classes. The complexity of the relational database behind the system is thereby hidden from the user. The same approach allows the creation of interfaces and services by writing simple Python code reusing standard Python libraries alongside WISE Python classes. VI.
DATA MODEL ON DEMAND
Due to the data-centric nature of the systems based on the WISE Concept a crucial step in the development of such a system is the creation and implementation of a complex data model. Originally developers and programmers specified the data model directly in Python code. This worked well for Astro-WISE and the LTA, but was found to present a unnecessary steep learning curve which hindered commercial use. Even in scientific projects, it was necessary for the data modeler, to also be proficient in Python. Additionally it creates a problem if the number of data model developers increases from dozens (Astro-WISE) to hundreds (Euclid). To overcome these problems different ways of expressing data models have been developed. MUSE-WISE, for example, uses a model expressed in XML. This is then automatically turned into the Python code necessary to create and populate the physical implementation of the data model in a database. The Euclid project adopted a similar process for its enormously complex data model which is described in XSD. These methods both simplify the process of implementing a complex data model and eases the maintenance when changes in the data model are made.
Fig. 3. RawFits Python class created from RawFits definition to accomodate objects ingetsed by the user
For example, the description of an image (which will typically come from an astronomical instrument as a file in
FITS format which is ingested into the system) is in the XSD definition rawFits (Fig.2), which is turned into the Python class RawFits (Fig.3). Fig.2 illustrates how complex the definition can be and how it can include references to a number of objects of different classes. The data processing in the system starts with the ingestion of the raw image obtained from the instrument and creation of the object in the metadata storage according to RawFits class (or any other class designed for the raw data). It is important to note that RawFits is the first step in the processing chain: on each step in this chain objects will be created which will refer back to the original object of the RawFits class. This object contains all the metadata for the image and the path to the image file. In many cases the queries of the user can be satisfied with the metadata alone, without the necessity to inspect the file itself. The user can browse the system for any attribute of the class, including moving down through the chain of dependencies. In Figure 4 a user requests the system to return all raw FITS files for the OmegaCAM instrument (which used for the KIDS survey) and retrieves the storage URL for the first object in the list.
Fig. 4. Querying operations on RawFits objects with AstroWISE CLI
VII. ASTRO-WISE NODES The typical composition of a node of the Astro-WISE information system (located at a particular geographical location) consists of data storage (one or more Astro-WISE dataservers), metadata storage (implemented in an Oracle RDBMS), a Distributed Processing Unit interfacing the system to available processing resources and user services including Command Line Interface. All nodes are connected through the internet and can interact at the component level. As an example, all Astro-WISE dataservers (data nodes) communicate with each other and build a single data storage grid with the ability for the user to access a file stored on any dataserver from any other dataserver or client application. Figure 5 shows the current composition of Astro-WISE (with 6 nodes across Europe) and also details of the AstroWISE node in the OmegaCEN data center at the University of Groningen. All these components are optional, an Astro-WISE node can be installed without dataservers (in which case dataservers from other nodes are used), metadata database or processing facilities. The only restriction is that there must at least one
916 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE
SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
metadata database and data storage component in the overall information system.
• Privilege 3: used for Astro-WISE data, which is shared with all named users of Astro-WISE; • Privilege 4: used for public data, which is visible to the anonymous user of Astro-WISE; • Privilege 5: Virtual Observatory data, which is accessible through the Virtual Observatory interfaces. This is additional level of the public data which is used to ensure that only validated and qualified public data are provided for VO. IX.
INTEGRATION OF EXTRENAL STORAGE AND PROCESSING RESOURCES
Fig. 5. Example of a complete Astro-WISE node (OmegaCEN
VIII. AUTHORIZATION AND AUTHENTICATION SYSTEM Astro-WISE is a multi-user system which must accommodate data storage and processing performed by a number of scientists. To promote collaborative work, the system should allow the user to share data, or keep data private, as required. Each user in the Astro-WISE system has an identity protected by a password. If the user wants to make use of the option to submit job to Grid resources he has to also get a Grid certificate. The authorization and authentication system is implemented on the level of the metadata database. When a user is logged in with his username/password, the user’s privileges are checked in the database. Subsequently, the user can browse the data according to his privileges, whilst obeying the wishes of data owners. Each data item (i.e. an object in the metadata database and associated data files) in the system has a scope of visibility: data items are grouped by projects, where the project is a collection of resources which is associated with a group of users who process the same or related collections of data items. The system of access to the data is based on three attributes, which any data item in Astro-WISE has: • User: identifies the user that created the data entity • Project: defines to which project the data entity belongs • Privileges: defines who is able to use this data item. These three attributes are initialized the first time the data item is made persistent in the Astro-WISE system (including the case that the item is created by one of the Astro-WISE pipelines) and stay persistent for the life cycle of the data item. In the case of Astro-WISE there are 5 levels of access to the data dictated by the privilege attribute of the data item: • Privilege 1: used for a user's private data, which is visible only to the creator of the data item; • Privilege 2: used for project data, which is shared with all users within project;
Fig. 6. Example of integration non-homogeneous storage and processing resources in Astro-WISE
The original Astro-WISE solution for the storage of data files referenced from the metadata layer is the Astro-WISE Dataserver. At its core, this dataserver is a simple http server written in Python which is installed on top over any posix file system and provides the Astro-WISE user with the ability to store and retrieve files. Dataservers can be configured in groups and can exchange information about files. Unique filenames are used. Consequently, a user can request any dataserver for a file: if it is missing on this dataserver the remaining data servers in the group will be requested to provide the file, which will then be transparently delivered to the user. Dataservers are independent from each other which provide a high degree of scalability. The current Astro-WISE data servers manage 36172381 files on 1.6 Pbytes of storage space distributed over 6 geographically distinct locations. The original dataservers are completely decoupled from the metadata database, the filename is the only information needed to retrieve the file. There is a mechanism to cache requested files on the dataserver from which the user requested the file, so the data storage system has a degree of self-organization depending on the frequency of requests. The early versions of the Astro-WISE system employed only resources dedicated to this system and managed by it. In the development of systems based on Astro-WISE (especially the Lofar Long-Term Archive) users requested an ability to incorporate external storage and processing resources into the system, i.e. to store and process data on elements which are not managed by the information system itself. In the case of Lofar the processing interface of Astro-WISE (known as a DPU) was modified to be able to submit jobs to a Grid Computing
917 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE
SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
Element and permanently store data on Grid Storage Elements, while at the same time keeping all the features of the original Astro-WISE system, including the full data lineage. The metadata database serves as a global file system catalog allowing the storage of data files on a number of storage solutions. Each storage solution should be integrated on the data layer level by providing an interface which is able to execute the same set of commands (store, retrieve, delete, check), each data processing element should be interfaced through a Distributed Processing Unit (DPU). Figure 6 shows the general overview of the user access to Astro-WISE and Grid resources (the Netherlands BiGGrid in this case), including computing elements (CE) and storage elements (SE). An Astro-WISE user who wishes to store data on the Grid or to submit jobs to the Grid infrastructure first must obtain a Grid certificate. This certificate is then used to create a proxy certificate on a MyProxy server which will, in turn, be used by the DPU to submit a job to the Grid or to store data. As soon as the proxy has been created the user can execute Astro-WISE commands to operate on the Grid. For processing the DPU server was modified to use a proxy from the MyProxy server during submission of a job to the Grid. The user can select which DPU (and hence which computing elements) to submit the job and which storage elements to use. In the case of a Grid storage element the user’s certificate will be checked to verify the user and the user’s membership of the Virtual Organization involved. The Virtual Organization Membership Service (VOMS) is used to assign roles to the user within Virtual Organization. The DPU will take care of adding the required VOMS role to the users Grid proxy. The Grid Storage Element is an example of the inclusion of a storage solution external to the original system. “External” here means that the system does not control the resources of the storage solution. Currently the list of external storage solutions which can be deployed for WISE systems includes Grid, iRODS and sftp servers. The list can be extended on request to the designers of a particular information system. X.
INTERFACES AND SERVICES
Fig. 7. Classes of Astro-WISE services and interfaces
For the end user the information system is only as good as its interfaces and services. The WISE concept supports the construction of a number of appropriate services, from a simple metadata browsing service (dbview) to the Target processing service which allows to the user to request the system to execute subsequent steps in the data processing (Target Processor) [3]. In the case of the original Astro-WISE services they use Python classes and Python objects populated with metadata records to create an interface between the user and the information system. Fig. 7 gives an overview of how the Astro-WISE services are mapped onto the three main layers of the information system. For each layer we created a basic API. Such an interface was created as a collection of methods for each class in the object-oriented data model of Astro-WISE e.g. the store() and retrieve() methods for the data files in the data layer. Based on typical requests to the system the following types of interaction of the user with the system can be identified: 1) metadata browsing – usually a simple selection of the data items. Querying is performed on a single or small number of attributes of the data item; 2) user interaction – user Command Line Interface (Fig. 4); 3) data exploration – advanced data mining with subselection and modified requests; 4) data processing – an ability to create new data items by launching the data processing; 5) visualization - an ability to inspect data item, especially astronomical images in the case of the original Astro-WISE system. Each service on Fig. 7 implements one or more of types of user interaction. For example, dbview is the core user service for browsing metadata: it supports the execution of requests to the metadata database (both QBE and SQL) and the visualization of images selected as a result of the request. All Astro-WISE interfaces are built from standard buildingblocks written in Python. In the core of each service (except CLI) is a web server written in Python and invoking a number of services as Python modules.
Fig. 8. Target Processor service with data processing steps for KIDS
918 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE
SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
One of examples of a user service is the Astro-WISE Target Processor which implements the “Target” concept data processing (Fig.8). The target Processor allows to the user to review the status of the data processing at each step and to select the data products to be created by the system. A user requests a certain object in the processing chain and the system will then determine is such an object exists and if not what should be done to create it. In the most extreme case only the raw objects exist and everything downstream is processed by the Target Processor, until the user requested object is made. The Target Processor traverses the data model, using the persistent dependencies of the classes. For each dependency the Target Processor checks if an instance of the dependency already exists in the database, and if so, whether the dependency is up-to-date (Fig.9).
In this development the WISE approach was used to create new information systems extending the original Astro-WISE on new data models, new data storage and processing capacities and new fields. Significant scientific projects which have used WISE technology during this project include GLIMPS [5], the LOFAR Long Term Archive [2] and MUSE-WISE [6]. In addition a number of commercial spinoffs have been led by the company Target Holding. To facilitate commercial exploitation a considerable effort is being made to produce versions of the core WISE software which can be used with a variety of back end databases. For similar reasons easy to install versions of the WISE software, with out-of-the box capability are required.
Fig. 9. View of reprocessing request with all input data products and parameters
The data lineage enables the information system to provide the user with up-to-date information about the quality assessment done automatically by the system's pipelines or by other users (Fig.10). The quality is accessible not only for this particular data product but for all parent and child data products: enabling steps in the data processing which influences the quality assessment of the whole data processing chain to be found. XI.
Fig. 10. The quality Service of Astro-WISE
XII. FUTURE DEVELOPMENT
TARGET
During the last few years development of the WISE concept has given rise to the Target project. Target is an expertise center in the Northern Netherlands which is building a cluster of sensor network information systems and provides cooperation between a number of scientific projects and business partners including IBM and Oracle. Target has created and supports hardware infrastructure for hosting tens of Petabytes of data for projects in astronomy, medicine, artificial intelligence and biology.
All the projects mentioned above are living projects, with systems which are constantly evolving. In addition new projects are emerging which make use of the WISE concept. Euclid is the dark-energy and dark-matter satellite of the European Space Agency. It is due for launch in 2020. The Euclid Archive System is a complete information system which will stand at the core of the Euclid Science Ground Segment [7]. The Euclid Archive System will be the sole method of transmitting data within the Euclid consortium and disseminating data outside the consortium [8]. The Euclid
919 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE
SAI Intelligent Systems Conference 2015 November 10-11, 2015 | London, UK
Archive System Prototype is a functional information system, which uses many of the WISE concepts and is already providing service to the Euclid community [9]. MICADO [10] is the Multi-AO Imaging Camera for Deep Observations, which has been designed to work with adaptive optics on the 40-m class European Extremely Large Telescope. Start of operations is anticipated in the mid-2020s. MICADO has already adopted a WISE data-centric approach to their data management system and the preliminary work on MICADOWISE is underway. ACKNOWLEDGMENTS This work was partly performed as part of the Target project. The Target project is supported by Samenwerkingsverband Noord Nederland. It is financially supported by the European fund for Regional Development, the Dutch Ministry of Economic Affairs (Pieken in de Delta), the Province of Groningen and the Province of Drenthe. [1]
REFERENCES K.Begeman et al., The Astro-WISE datacentric information system, Experimental Astronomy 35, 1, 2013
[2]
K. Begeman et al., LOFAR Information System, Future Generation Computer Systems 27, 319, 2011 [3] J.T.A. de Jong et al., The Kilo-Degree Survey, Experimental Astronomy 35, 25, 2013 [4] A.N. Belikov, W.-J.Vriend, G. Sikkema, Astro-WISE Interfaces, Experimental Astronomy 35, 301, 2013 [5] L.K. Teune, Glucose metabolic patterns in neurodegenerative brain diseases. PhD Dissertation at Unversity of Groningen, 2013 [6] J. Pizagno, O. Streicher, W-J. Vriend, Integration of the MUSE Software Pipeline into the Astro-WISE System, Proceedings of Astronomical Data Analysis Software and Systems XXI, ASP Conference Series 461, 557, 2012 [7] F. Pasian et al., Science ground segment for the ESA Euclid Mission, Proceedings of Software and Cyberinfrastructure for Astronomy II, SPIE Proceedings 8451, 2012. [8] O.R.Williams, A.Belikov, J. Koppenhoefer, Data transmission, handling and dissemination issues of EUCLID Data, in Proceedings of the NETSPACE Workshop, ed. O.Sykioti & I.A. Daglis (NOA, Athens), 11, 2014 [9] A.Belikov et al, Euclid Archive System Prototype, in Proceedings of 2014 conference on Big Data from Space (BiDS’2014), ed. P.Soille and P.G. Marchetti, 346, 2014 [10] I.S. McLean et al, Proceedings of Ground-based and Airborne Instrumentation for Astronomy III, SPIE Proceedings 7735, 2010.
920 | P a g e 978-1-4673-7606-8/15/$31.00 ©2015 IEEE