a VIsual sciEntific Workflow management system - Semantic Scholar

92 downloads 1063 Views 255KB Size Report
cution, monitoring, re-run, and provenance tracking of sci- entific processes [1, 3, 4, ... gine, a provenance server, a MySQL database server, and a provenance ...
V IEW: a VIsual sciEntific Workflow management system Artem Chebotko, Cui Lin, Xubo Fei, Zhaoqiang Lai, Shiyong Lu, Jing Hua, Farshad Fotouhi Department of Computer Science Wayne State University 5143 Cass Avenue, Detroit, Michigan 48202, USA {artem, cuilin, xubo, kevinlai, shiyong, jinghua, fotouhi}@wayne.edu

Abstract In this demo, we present current status of our VIsual sciEntific Workflow management system called V IEW, highlighting the following two features: (i) the use of Semantic Web technology to represent, store, and query provenance metadata, leading to an interoperable and extensible provenance system, and (ii) the support of visualization of various provenance graphs and intermediate or final data products of a workflow run in the form of medical images or 3-D graphical models. V IEW seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technology, the querying and storage power of a RDBMS, and the appealing visual features of visualization techniques.

1. Introduction A scientific workflow is a formal specification of a scientific process, which represents, streamlines, and automates the analytical and computational steps that an e-scientist needs to go through from dataset selection and integration, computation and analysis, to final data product presentation and visualization. A Scientific Workflow Management System (SWMS) supports the specification, execution, monitoring, re-run, and provenance tracking of scientific processes [1, 3, 4, 2]. A scientific workflow approach can greatly facilitate and speed up the development and deployment of complex and evolving scientific applications, optimize their performance, and support the reproducibility of scientific discovery with provenance tracking. In this demo, we present current status of our VIsual sciEntific Workflow management system called V IEW, highlighting the following two features: (i) the use of Semantic Web technology to represent, store, and query provenance metadata, leading to an interoperable and extensible provenance system, and (ii) the support of visualization of various provenance graphs and intermediate or final data products

of a workflow run in the form of medical images or 3-D graphical models. V IEW seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technology, the querying and storage power of a RDBMS, and the appealing visual features of visualization techniques. We currently use V IEW in a medical imaging and visualization scientific workflow area which requires the management of heterogeneous data types and metadata, complex user input and interaction, and non-trivial visualization of workflow data products.

2. Overall Architecture of V IEW The high level architecture of V IEW is presented in Figure 1 and currently includes a workbench, a workflow engine, a provenance server, a MySQL database server, and a provenance explorer. In the following, we discuss and demonstrate the main features of V IEW.

2.1

Workbench and Workflow Engine

The V IEW workbench provides an intuitive GUI to design workflows. An e-scientist is able to drag-and-drop a task box that represents an analytical or computational step, specify its input and output ports, configure its implementation, and connect its ports to other tasks by dataflow connections. A screenshot of the design of a sample workflow that performs cocluster analysis of cortico-cortical fiber tracts

Workbench

Provenance Explorer

Workflow Design

Query, Browsing, Visualization

Workflow Engine

Provenance Server

Workflow Execution

Setup

Provenance Collection

Record MySQL Query

Figure 1. Architecture of the V IEW System

Figure 2. V IEW Workbench

in V IEW is shown in Figure 2. The workbench stores the workflow design document represented in RDF format in the provenance server for future query. The workflow engine supports the execution of a workflow and the collection of provenance metadata in RDF format using the vocabulary defined in our designed OWL provenance ontology (PO), which currently includes over 30 classes and 40 properties for the description of workflow provenance. Provenance metadata is also stored in the provenance server.

2.2

Provenance Server

Figure 3. V IEW Provenance Explorer

3

V IEW Demonstration Scenario

The provenance server provides three main functionalities, namely setup, record, and query, to be utilized by the workbench, workflow engine and provenance explorer. Setup creates a relational database in MySQL and generates its schema based on the class and property information in PO. Record, given a set of RDF triples, uses an inference engine to infer new triples based on the OWL predefined semantics (e.g., class hierarchy) and additional inference rules defined for PO. The enhanced triple set is mapped to a set of relational tuples that is stored in the database. Query provides SQL and SPARQL query interfaces to access provenance metadata stored in the database, such that SPARQL queries are translated into SQL queries and the SQL queries are executed over relational data by the database.

Consider the following scenario. When an e-scientist starts a new project that may contain multiple workflows and workflow runs, the workbench calls setup to initialize a provenance database. While the e-scientist creates a workflow, the workbench collects workflow description into a log and, once the workflow design is complete, stores it through the record interface. An existing workflow can evolve into a new workflow, augmenting the provenance database with new data. Similarly, while a workflow runs, the workflow engine collects its provenance into a log and, once the execution is finished, stores it into the database. Finally, the e-scientist can use the provenance explorer to retrieve required provenance metadata (e.g., workflows, workflow runs, data dependency graphs, workflow evolution, etc.) through the query interface for further browsing and visualization.

2.3

Acknowledgements

Provenance Explorer

V IEW can visualize various provenance graphs retrieved from the provenance database through the SPARQL or SQL query interface. In addition, V IEW can visualize static medical images or interactive 3-D graphical models of intermediate or final data products of a workflow run. In Figure 3, using our provenance explorer, we (1) execute a SPARQL query to retrieve all RDF triples that describe a data dependency graph of a particular workflow run, (2) visualize this graph, and (3) visualize one of data products as a 3-D brain model.

The authors would like to thank Dr. Otto Muzik at the PET center of the school of Medicine, Wayne State University, for providing the human brain DTI datasets.

References [1] [2] [3] [4]

Kepler. http://kepler-project.org. Pegasus. http://pegasus.isi.edu. Taverna. http://taverna.sourceforge.net. VisTrails. http://vistrails.sci.utah.edu.

Suggest Documents