Post-processing Workflows Using Data Grids to Support ... - CUAHSI

28 downloads 1252 Views 2MB Size Report
http://boto.ocean.washington.edu/s tory/show/45. VIC Output data set on the. iRODS server. Variable Infiltration Capacity (VIC). Macro-scale Hydrologic Model.
Post-processing Workflows Using Data Grids to Support Hydrologic Modeling Bakinam T. Essawy and Jonathan L. Goodall Department of Civil and Environmental Engineering University of Virginia

3rd CUAHSI Conference on Hydroinformatics July 17, 2015

VIC Output data set Variable Infiltration Capacity (VIC) Macro-scale Hydrologic Model

VIC data set http://boto.ocean.washington.edu/s tory/show/45

VIC Output data set on the iRODS server

Example for a flux file. Fluxes_x_y: x = latitude, y = longitude flux files contain information about moisture and energy fluxes for each time step for the three layers of soil (Top, Middle, and Deep).

The VIC Model VIC = Variable Infiltration Capacity; A regional-scale land surface hydrology model • VIC developed at UWashington and Princeton; applied worldwide • Spatial resolution: 1/8-degree grid cell Three layers of soil:  top layer (Layer 0, 0-10cm)  mid layer (Layer 1, 10-30cm)  lower layer (Layer 2, 30-100cm) •

Source : Gao et al. (2009)

The County-level population data extracted from Terra Populus Website

Integrated Rule-Oriented Data System (iRODS) • The iRODS-enabled Data Federation Consortium (DFC) is an NSF project that provides support for both federation of resources and services. • This work is funded the by DFC project, and uses a DFC data grid for storage and long term access to the stored datasets over heterogeneous resources. • The DFC data grid also supports sharing of workflows that enable the reproducibility of the model results

Workflow Structured Object (WSO) • Within the iRODS data grid, a Workflow Structured Object (WSO) enables the execution of a workflow, while capturing provenance information and archiving results. • The workflow, the input files, and the output files can be shared. • The workflow can be re-executed with new input files and versions of the output file are automatically saved.

Objectives • Demonstrating how different data transfer approaches can be used for connecting cyberinfrastructure systems developed by different groups. • Demonstrate how iRODS can provide federation across data grids.

Objectives • Using the AWS (Amazon Web Services) for computing, and how public repositories like SEAD allow sharing and uniquely identifying data and modeling resources used within analyses. • We are trying to reach an approach for model reproducibility, where a scientist can easily share his model, input and output in an easy way so others can benefit from it.

Main components and data flow in the post-processing system

WSO files used by WSO for creating the visualization

Workflow file

Shell Script

Python Scripts

Parameter File

Visualization

Two main directories for storing all files required by the WSO on the iRODS server The location of the mounted WSO on the iRODS server The mounted collection The parameter file, the generated run file and the output RunDir for each time the run file is accessed

Component of one of the runDir associated with the WSO like the staged data in or out, the cvs files output from python scripts, and the stdout

The location were the shell script and the python scripts located on the iRODS server

The execution of the WSO installed on the hydrology grid from the client machine The user log in to client machine where the icommands are installed on ils to list all the collection under the path: /hydrology/home/bakinam icd to change collection were the WSO files are located Listing the mounted collection

Running the WSO through iget command to run the generated .run file. Output Message indicates that the WSO has been executed successfully

Conclusion • Reproducible data visualizations on large hydrological data collections • Using strong and weakFederation of data across communities (e.g., TerraPop interoperability example) • Publishing data along with workflow-produced metadata (e.g., SEAD interoperability example) using unique Identifier.

Future Plans • Swap SEAD with Hydroshare to share my datasets and create a resource type from my WSO.

Questions Bakinam T. Essawy Department of Civil and Environmental Engineering University of Virginia [email protected]

Suggest Documents