5 Conclusions and Further Work

1 downloads 0 Views 17MB Size Report
MDS uses the Lightweight Directory Access .... sweep is plotted over the Puerto Rico west coast area. ... data set and includes loop controls and zooming.
Chair: Gregor von Laszewski, [email protected] http://sc06.supercomputing.org/schedule/event_detail.php?evid=9233

2006 International Workshop on Grid Computing Environments 2006 in Conjunction with SC06

Table of Contents Grid Portal Development for Sensing Data Retrieval and Processing Diego Arias, Mariana Mendoza, Fernando Cintron, Kennie Cruz, Wilson Rivera Grid Portals for Bioinformatics Lavanya Ramakrishnan, Mark S.C. Reed, Jeffrey L. Tilson, Daniel A. Reed Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton, Kevin Price, Anurag Shankar, Von Welch, Nancy Wilkins-Diehr CoaxSim Grid: Buidling an Application Portal for a CFD Model Byoung-Do Kim, Nam-gyu Kim, Jung-hyun Cho, Eun-kyung Kim, Joshi Fullop Secure Federated Light-weight Web Portals for FusionGrid D. Aswath, M. Thompson, M. Goode, X. Lee, N. Y. Kim Portal-based Support for Mental Health Research David Paul, Frans Henskens, Patrick Johnston, Michael Hannaford WebGRelC: Towards Ubiquitous Grid Data Management Services Giovanni Aloisio, Massimo Cafaro, Sandro Fiore, Maria Mirto Workflow Management Through Cobalt Gregor von Laszewski, Christopher Grubbs, Matthew Bone, David Angulo Workflow-level Parameter Study Management in Multi-grid Environments by the P-GRADE Grid Portal Peter Kacsuk, Zoltan Farkas, Gergely Sipos, Adrian Toth, Gabor Hermann The Java CoG Kit Experiment Manager Gregor von Laszewski, Phillip Zinny, Tan Trieu, David Angulo A Deep Look at Web Services for Remote Portlets (WSRP) and WSRP4J Xiaobo Yang, Rob Allen

Connected in a Small World: Rapid Integration of Heterogenous Biology Resources Umut Topkara, Carol X. Song, Jungha Woo, Sang P. Park My WorkSphere: Integrative Work Environment for Grid-unaware Biomedical Researchers and Applications Zhaohui Ding, Yuan Luo, Xiaohui Wei, Chris Misleh, Wilfred W Li, Peter W. Arzberger, Osamu Tatabe A PERMIS-based Authorization Solution between Portlets and Back-end Web Services Hao Yin, Sofia Brenes Barahona, Donald F. McMullen, Marlon Pierce, Kianosh huffman, Geoffrey Fox Kickstarting Remote Applications Jens S. Vockler, Gaurang Mehta, Yong Zhao, Ewa Deelman, Mike Wilde Real-time Storm Surge Ensemble Modeling in a Grid Environment Lavanya Ramakrishnan, Brian O. Blanton, Howard M. Lander, Richard A. Luettich, Jr., Daniel A. Reed, Steven R. Thorpe TeraGrid User Portal v1.0: Architecture, Design,and Technologies Maytal Dahan, Eric Roberts, Jay Boisseau Extending Grid Protocols onto the Desktop using the Mozilla Framework Karan Bhatia, Brent Stearn, Michela Taufer, Richard Zamudio, Daniel Catarino Mehmet A. Nacar, Marlon Pierce, Gordon Erlebacher, Geoffrey C. Fox

Abstract: This workshop will focus on projects and technologies that are adopting scientific portals and gateways. These technologies are characterized by delivering well-established mechanisms for providing familiar interfaces to secure Grid resources, services, applications, tools, and collaboration services for communities of scientists. In most cases access is enabled through a web browser without the need to download or install any specialized software or worry about networks and ports. As a result, the science application user is isolated from the complex details and infrastructure needed to operate an application on the Grid. Additional information about this workshop is available at http://www.cogkit.org/GCE06

1

Grid Portal Development for Sensing Data Retrieval and Processing Diego Arias, Mariana Mendoza, Fernando Cintron, Kennie Cruz, and Wilson Rivera Parallel and Distributed Computing Laboratory University of Puerto Rico at Mayaguez P.O.Box 9042, Mayaguez, Puerto Rico 00681, USA

Abstract This paper presents our experiences developing grid portals for radar and sensor based applications. Underlying these gateways there are existing grid technologies such as Globus Toolkit 4.0.1 and Gridsphere. The grid portals provide secure and transparent access to applications dealing with data acquired from network of radar and sensors deployed in Puerto Rico, while implementing useful functionalities for data management and analysis. 1

Introduction

Grid computing [1] involves coordination, storage and networking of resources across dynamic and geographically dispersed organizations in a transparent way for users. The Open Grid Services Architecture (OGSA) [2], based upon standard Internet protocols such as SOAP (Simple Object Access Protocol) and WSDL (Web Services Description Language), is becoming a standard platform for grid services development. Operational grids based on these technologies are feasible now, and a large number of grid prototypes are already in place (e.g. Grid Physics Network (GridPhyN) and Teragrid among many others). Although applications can be built using basic grid services, this low-level activity requires detailed knowledge of protocols and component interactions. In contrast, grid portals hide this complexity via easy-to-use interfaces, creating gateways to computing resources. An effective grid portal provides tools for user authentication and authorization, application deployment, configuration and application execution, and management of distributed data sets.

The Open Grid Computing Environments (OGCE) portal software is the most widely used toolkit for building reusable portal components that can be integrated in a common portal container system. The OGCE portal toolkit includes X.509 Grid security services, remote file and job management, information and collaboration services and application interfaces. The OGCE portal toolkit is based on the notion of a “portlet,” a portal server component that controls a user-configurable panel. A portal server supports a set of web browser frames, each containing one or more portlets that provide a service. This portlet component model allows one to construct portals merely by instantiating a portal server with a domain specific set of portlets, complemented by domain-independent portlets for collaboration and discussion. Using the toolkit, one wraps each grid service with a portlet interface, creating a “mix and match” palette of portlets for portal creation and customization. Recently, there have been significant advances in grid portal technologies and development of scientific grid interfaces [3, 4]. This paper presents our experiences developing grid portals for radar and sensor based applications. The organization of the paper is as follows. Section 2 discusses briefly the grid testbed infrastructure deployed at the University of Puerto Rico to investigate issues related to grid computing. Section 3 and section 4 describe the applications and the grid portal developments. Section 5 discusses related work. 2

Grid Test-bed Infrastructure

The PDCLab Grid Testbed, deployed at the University of Puerto Rico-Mayaguez, is an experimental grid designed to address research issues, such as the effective integration of sensor

2 and radar networks into grid infrastructures. The PDClab grid test-bed components run CentOS 4.2 and the Globus Toolkit 4.0.1. The Globus Toolkit includes, among other components, services, such as a security infrastructure (GSI), data transport service (GridFTP), execution services (GRAM), and Information services (MDS). The Grid Security Infrastructure is used by the Globus Toolkit for authentication and secure communication. GSI is implemented using public key encryption, X.509 certificates, and the secure sockets layer (SSL) communication protocol and incorporates single sign-on and delegation.

xSeries Linux cluster with 64 nodes, dualprocessor at 1.2GHz, 53GB of memory and 1TB of storage; Eight (8) IA-64 Itanium servers, dual processor at 900 MHz, each with 8GB of memory and 140GB of SCSI Ultra 320 storage; Two (2) IA-32 Pentium IV servers, dual processor at 3 GHz, each with 1GB of memory and 120GB of ATA-100 storage; One (1) IA-32 Pentium III server, dual processor at 1.2 GHz with 2GB of memory and 140Gb of SCSI Ultra 160 storage; One (1) IA-32 Xeon server, dual processor at 2.8 GHz, L2 Cache 1MB with 1GB of memory and one 230 GB RAID of storage (STB Server); and two (2) PowerVault storage with 8TB.

The Monitoring and Discovery Service (MDS) is used to discover, publish and access both static and dynamic information from different resources in a computational grid. MDS uses the Lightweight Directory Access Protocol (LDAP) to access such information on the different grid components and provides a unified view of the disparate grid resources. The Globus Resource Allocation Manager (GRAM) is used for allocation and management of resources on the computational grid using a Resource Specification Language (RSL) to request resources. GRAM also updates the MDS with information as to the availability of grid resources. The GRAM API can be used to submit a job, query the status of a job, and cancel a job. A GRAM service runs on each resource that is part of the grid and that is responsible for interfacing with the local site resource management system (e.g. OpenPBS, Condor). GridFTP is a secure, high-performance and robust data transfer mechanism used to access remote data. In addition to GridFTP, Globus provides Globus Replica Catalog to maintain a catalog of dataset replicas so that, instead of duplicating large datasets, only necessary pieces of the datasets are stored on local hosts. The Globus Replica Management software provides the replica management capabilities for data grid by integrating the replica catalog and GridFTP. The computational resources available on the grid testbed (see Figure 1) include an IBM

Figure 1: Grid-service based Infrastructure 3

The Student Test-Bed (STB) Grid Portal

The CASA1 project is an NSF Engineering Research Center investigating the design and implementation of a dense network of lowpower meteorological radars whose goal is to collaboratively and adaptively sense the lowest few kilometers of the earth's atmosphere. We have deployed a grid-service based tool to access and manipulate radar data from a radar network. The access to this infrastructure is provided via a grid portal interface. The developed grid portlets provide a presentation layer for the manipulation of both, processed data and raw-data from radar, and for the services to end-users. Additionally, the visualization of weather information is implemented also via portlets.

1

http://casa.umass.edu/

3 The portal presentation layer and core portlets, included in the basic installation are made possible using Gridsphere. Figure 2 shows the customized STB portal. Gridsphere provides portlets for managing user accounts inside the portal framework. This set of portlets is integrated in the STB portal design to improve controlled access to certain resources and services which will be explained further on.

Figure 3: Data management portlet

Figure 2: STB Grid portal interface Users can access raw-data from radars through the grid portal. Files containing the rawdata are stored using the NetCDF2 format. The data management portlets allows end-users to download the data in such a way that they can obtain an exact copy of a file or a set of files. To avoid the server overload, raw data requests are restricted to registered users only. Raw-data does not provide comprehensible information; it requires additional tools for extraction and processing. As a result, this feature is designed for advanced users (students, teachers and researchers) who have the adequate software and previous knowledge of data from radars. These kinds of users are able to request an account from the portal administrator.

Figure 3 shows the data management portlets. Once the user has been logged into the portal, the raw data request portlets are made available. The initial portlet shows a single selection form that permits the selection of the date of interest and then all available data is listed. Then, the data set selected can be downloaded as a compressed file. The grid portal provides current rainfall estimates over the western area of Puerto Rico through reflectivity displays. This information is unrestricted and is available for anyone who accesses the portal. Figure 4 shows how the base reflectivity information corresponding to a sweep is plotted over the Puerto Rico west coast area.

Figure 4: Base Reflectivity portlet.

2

http://www.unidata.ucar.edu/software/netcdf/

4 NCtoJPG: Rainfall rate plots are available in the Grid portal; but older plots are not maintained in to safe storage. Using the grid portal, users can send out from date data sets to the grid, and then receive the corresponding reflectivity plots. NCtoASCII: Through use of the grid portal, users can convert tNetCDF files to text files. This tool eliminates the utilization of extra software for data manipulation.

Figure 5: Base Reflectivity animation portlet Figure 5 shows a portlet used to display a set of base reflectivity over the Puerto Rico west coast area. This portlet performs the animation of the data set and includes loop controls and zooming. The base reflectivity loop is useful in facilitating the tracking of meteorological phenomena. NetCDF data is written as binary files, thus, it can not be read by users as plain text, and specialized software is required for its interpretation. There are several libraries, plugging, programs and a variety of tools to manipulate NetCDF files. However installation, configuration and usage of these tools can be very complex for inexperienced users. Additionally, due to format flexibility, the structure of the files varies, depending on the implementation procedure. To perform a specific task, one or more software tools are needed. For example, there is not available software to generate the reflectivity plots from the radar raw-data. A Java class was developed using more basic classes and libraries for NetCDF manipulation. Additionally, a similar class was developed to convert NetCDF to ASCII. To facilitate the manipulation of raw-data from DCAS network nodes, two very useful services were implemented. These services allow end-users the execution of processes over the raw-data available in the storage system. Thus, users can upload its data sets from a local machine to the server, and process them. The available processes are:

Services for end-users involve executing a process over a single file or a set of files. For instance, a set of NetCDF files can be uploaded with a NCtoJPG request. Data is processed and the output files are made available for downloading, using the grid portal. This entire procedure is transparent to users. The server may process each file and then reply to the output files; nevertheless, the server could be receiving data from the radar network or replying to other user requests at the same time. In order to avoid a crash due to an overload of simultaneous tasks, remote job execution is introduced. The server can submit a simple job or a multi-job to the grid testbed, instead the routine performance of simple local jobs only. Job submission is supported by Globus through GRAM. Additionally, PBS (Portable Batch System) is used as a job scheduler. Figure 6 shows the job submission functionality.

Figure 6: Job submission architecture As shown in Figure 7, an important issue to point out when submitting multiple jobs is that CPU consumption is very quite high ( 97%) when a job is executed in a local server, and is close to 1%, when executing on the STB grid infrastructure.

5 Netcdf to JPG - CPU Usage 100

98

98

98

97

98

1

1

1

3

4

97

97

94

93

93

0

0

0

8

9

10

90

CPU Usage (%)

80 70 60 50 40

Gridsphere. The design of the methodology for composing distributed signal operators follows two major requirements. Firstly, it is desirable to optimize resource management according to the complexity of the operators to be processed. Secondly, the composition of distributed resources requires metadata distribution and management mechanisms.

30 20 10

2

2

0 1

2

5 Grid

1 6

1 7

Server

Figure 7: Percentage of CPU usage

4

The WALSAIP Grid Portal

The NSF WALSAIP3 project is developing a new conceptual framework for the automated processing of information arriving from physical sensors in a generalized wide-area, large-scale distributed network infrastructure. The project is focusing on water-related ecological and environmental applications, and it is addressing issues such as scalability, modularity, signal representation, data coherence, data integration, distributed query processing, scheduling, computer performance, network performance, and usability. A distributed sensor network testbed is being developed at the Puerto Rico’s Jobos Bay Natural Estuarine Research Reserve (JBNERR)4. The reserve has more than 2800 acres is located on the southern coast of Puerto Rico, between the municipalities of Guayama and Salinas. It is administered by the National Oceanic and Atmospheric Administration and it is managed locally by the Department of Natural and Environmental Resources. One of the components of this project is developing a grid-based tool to define workflow composition of signal processing operators as an application service. This tool allows the composition of operators that may be geographically distributed and provided by diverse administrative domains. Again underlying this tool there are existing grid technologies such as Globus Toolkit 4.0 and 3 4

http://walsaip.uprm.edu/ http://nerrs.noaa.gov/JobosBay/

The Grid Portal Interface provides transparent and secure access to end-users. This portal allows end users to define signal processing workflows by using drag and drop functionalities. GridFTP is used to improve data transport from the data server (WALSAIP Server) to the Grid Portal server (PDC Server). Signal processing operators are deployed as grid services. This grid services may be geographically distributed and provided by different administrative domains. Figure 8 depicts the components of the application. Figure 9 illustrates the grid portal interface.

Figure 8: Signal processing application services over a grid environment

Figure 9: WALSAIP grid portal

6 5

Related Work

The Linked Environments for Atmospheric Discovery (LEAD) project [5] proposes an information technology framework for assimilating, forecasting, managing, analyzing, mining and visualizing a broad array of meteorological data and model output independent of format and physical location. LEAD is currently led by nine institutions. The LEAD system is dynamically adaptable in terms of time, space, forecasting and processing. The LEAD infrastructure includes technologies and tools, such as, Globus toolkit, Unidata's Local Manager (LDM), Open-Source Project for a Network Data Access Protocol (OpenDap) and the OGSA Data Access and Integration (OGSADAI) service. The LEAD portal is based on OGCE. Majithia et. al. [6] proposed Triana, a framework that allow users graphically create complex service compositions based on BPEL4WS (Business Process Execution Language for Web Services). It also allows users to easily carry out “what-if” analysis by altering existing workflows. Using this framework it is possible to execute the composed graph service on a Grid network. Gao et. al. [7] developed a service composition architecture that optimizes the aggregate bandwidth utilization within operator networks. A general service composition is proposed to model the loosely coupled interaction among service components as well as the estimated traffic that flows among them. Glatard et. al. [8] discussed how build complex applications by reusing and assembling scientific codes on a production grid infrastructure. The authors stated two paradigms for executing application code on a grid. A task based approach, associated to global computing, characterized by its efficiency, and the service approach, developed in meta computing and the Internet communities, characterized by its flexibility. References 1. Foster and C. Kesselman (1998), “The grid: blueprint for a future computing

2.

3.

4.

5.

6.

7.

8.

infrastructure” Morgan Kaufmann Publishers Foster, C. Kesselman, J. Nick, and S. Tuecke (2002), “The physiology of the Grid: An open Grid services architecture for distributed systems integration, Technical report, Open Grid Service Infrastructure WG, Global Grid Forum. D. Gannon, G. Fox, M. Pierce, B. Plale, G. von Laszewski, C. Severance, J. Hardin, J. Alameda, M. Thomas, J. Boisseau, Grid Portals: A Scientist's Access Point for Grid Services, GGF Community Practice document, working draft 1, September 2003. Gregor von Laszewski, Jarek Gawor, Sriram Krishnan, and Keith Jackson. Grid Computing: Making the Global Infrastructure a Reality, chapter Commodity Grid Kits - Middleware for Building Grid Computing Environments, pages 639–656. Communications Networking and Distributed Systems, Wiley, 2003. K.K. Droegemeier, D.Gannon, D. Reed B. Plale, J. Alameda, T. Baltzer, K. Brewster, R. Clark, B. Domenico, S. Graves, E. Joseph, D. Murray, R. Ramachandran, M. Ramamurthy, L. Ramakrishnan, J. A. Rushing, D. Weber, R. Wilhelmson, A. Wilson, M. Xue, and S.Yalda, Service-oriented environments for dynamically interacting with mesoscale weather. Computing in Science &Engineering, 7(6):12{29, Nov.-Dec. 2005. S. Majithia, M. Shields, I. Taylor, I. Wang. “Triana: A Graphical Web Service Composition and Execution Toolkit.” IEEE International Conference on Web Services (ICWS’2004), San Diego, California, USA, 2004. X. Gao, R. Jain, Z. Ramzan, U. Kozat. “Resource optimization for Web Service Composition.” In Proceedings of IEEE SCC2005, 2005. T. Glatard, J. Montagnat, X. Pennec. Efficient services composition for Gridenabled Data-intensive Applications.” In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC’06), Paris, France, 2006.

Real-time Storm Surge Ensemble Modeling in a Grid Environment

Lavanya Ramakrishnan1, Brian O. Blanton4, Howard M. Lander1, Richard A. Luettich, Jr.3, Daniel A. Reed1, Steven R. Thorpe2 1 Renaissance Computing Institute, 2MCNC, 3UNC Chapel Hill Institute of Marine Sciences, 4Science Applications International Corporation {lavanya, howard, dan_reed}@renci.org, [email protected], [email protected], [email protected] scale modeling and analysis has driven the use of Abstract high performance resources and Grid environments Natural disasters such as hurricanes heavily for such problems. impact the US East and Gulf coasts. This creates In this paper, we describe the distributed the need for large scale modeling in the areas of software infrastructure used to run a storm surge meteorology and ocean sciences, coupled with an model in a Grid environment. The sensitivity to integrated environment for analysis and timely model completion drives the need for information dissemination. In turn, this means there specific techniques for resource management and is an increased need for large-scale distributed increased fault tolerance when the models run in a high performance resources and data environments. distributed Grid environment. This framework was In this paper, we describe a framework that allows developed as a component of the Southeastern a storm surge model-ADCIRC to be run in a Universities Research Association’s (SURA) distributed Grid environment. This framework was Southeastern Coastal Ocean Observing and developed as a component of the Southeastern Prediction (SCOOP) program[20]. The SCOOP Universities Research Association’s (SURA) program is a distributed project that includes Gulf Southeastern Coastal Ocean Observing and of Maine Ocean Observing System, Bedford Prediction (SCOOP) program. SCOOP is creating Institute of Oceanography, Louisiana State an open-access grid environment for the University, Texas A&M, University of Miami, southeastern coastal zone to help integrate regional University of Alabama in Huntsville, University of coastal observing and modeling systems. North Carolina, University of Florida and Virginia Specifically this paper describes a set of techniques Institute of Marine Science. SCOOP is creating an used for resource selection and fault tolerance in a open-access grid environment for the southeastern highly variable ad-hoc Grid environment. The coastal zone to help integrate regional coastal framework integrates domain-specific tools and standard Grid and portal tools to provide an observing and modeling systems. Specifically, our integrated environment for forecasting and effort in this program is focused on two main areas: information dissemination. 1) storm surge modeling for the south east coast; and 2) experimenting with novel techniques to use 1. Introduction grid resources to meet real-time constraints of the Year after year, the US East and Gulf coasts are application. The storm surge component uses the heavily impacted by hurricane activity causing Advanced Circulation (ADCIRC)[12] model that large number of deaths and billions of dollars in computes tidal and storm surge water and currents, economic losses. For example in 2005, there were forced by tides and winds. While our framework 14 hurricanes, exceeding the record of 12 in 1969, was developed in the context of ADCIRC, the out of which 7 were considered major hurricanes solution is more general and is applicable for [9]. To help reduce the impact of hurricanes, there running other models and applications in grid is a need for an integrated response system that environments. In fact the framework is currently enables virtual communities [1] to evaluate, plan being applied to other models in the context of the and react to such natural phenomena. The integrated North Carolina Forecasting System[22]. system needs to handle real-time data feeds, Our solution builds on existing standard grid and schedule and execute a set of model runs, manage portal technologies including the Globus toolkit [2], the model input and output data, make results and Open Grid Computing Environment (OGCE)[4] status available to the larger audience. In addition, and lessons learned from grid computing efforts in to enhance the scientific validity of the models there other science domains, such as bioinformatics[21], is a need to be able to recreate scenarios and re-run astronomy[5] and other projects. A portal provides the models for retrospective analysis[19]. The large-

the front-end interface for users to interact with the ocean observing and modeling system. The users can conduct retrospective analysis, access historical data from previous model runs and observe the status of daily forecast runs from the portal. The real-time data for the ensemble forecast arrives through Unidata’s Local Data Manager (LDM)[15], an event-driven data distribution system that selects, captures, manages and distributes meteorological data products. Once all the data for a given ensemble member has been received, available and suitable grid resources are discovered using a simple resource selection algorithm. The model run is then executed and the output data is staged back to the originating site. The final ensemble result of the surge computations is inserted back into the SCOOP LDM stream for subsequent analysis and visualization by other SCOOP partners [18].

2.

Science Drivers

Before we detail our design and techniques, we present a brief description of the science elements that are the motivation for our decisions. As mentioned earlier for the storm-surge forecasts, we use the tidal and storm-surge model ADCIRC[12]. ADCIRC is a finite element model that solves the shallow-water generalized wave-continuity equations for a thin fluid layer on a rotating platform. The ADCIRC model is parallelized using Message Passing Interface (MPI). In the current implementation, we use a relatively coarse representation of the western North Atlantic Ocean. Figure 1. shows a 32-processing element decomposition of this ADCIRC grid. Storm surge modeling requires assembling input meteorological and other data sets, running models, processing the output and distributing the resulting

Figure 2. Timeline showing the computation of a hotstart file and a subsequent forecast. On Day K, the hotstart computed "yesterday" (Day K-1) is used to bring the hotstart sequence up to date, and an 84hour forecast is subsequently computed. This same hotstart file is used "tomorrow" (Day K+1) to start Figure 1. Domain decomposition of a highthe sequence over again. resolution ADCIRC grid used in the SCOOP computational system. information. In terms of modes of operation, most meteorological and ocean models can be run in ‘hindcast’ mode, as an after fact of a major storm or hurricane, for post-analysis or risk assessment, or in ‘forecast’ mode for prediction to guide evacuation or operational decisions[19]. The forecast mode is driven by real-time data streams while the hindcast mode is initiated by a user. Our framework is designed to support both these usage models for running ADCIRC and other models in a Grid environment. Further, often it is necessary to run the ADCIRC model with different forcing conditions to analyze

In this paper, we describe the interaction of the Grid components and specific techniques used for resource selection and fault tolerance during model execution. The rest of the paper is organized as follows. In §2 the science drivers are described in greater detail. We describes our design philosophy in greater detail in §3. The architecture and technology components are presented in §4 and §5, experiences from our system and related work in §6 and §7, and we present our conclusions and future work in §8. 2

forecast accuracy. This results in a large number of parallel model runs, creating an ensemble of forecasts. The meteorological modeling community has long recognized that a consensus forecast, based on an ensemble of forecasts, generally has better statistical forecast skill than any one of the ensemble members[14, 11]. Thus, we have taken an ensemble approach to storm-surge forecasting that requires access to a large number of computational clusters, coordinated access to data and computational resources, and the ability to leverage additional resources that may become available over time. Our operational cycle is tied to the typical 6-hour synoptic forecast cycle used by the National Weather Service and the National Centers for Environmental Prediction (NCEP). NCEP computes an atmospheric analysis and forecast four times per day, for which the forecast initialization times are 00Z, 06Z, 12Z, and 18Z. As ADCIRC solves discrete versions of partial differential equations, both initial and boundary conditions are required for each simulation. Boundary conditions include the wind stress on the ocean surface (an ensemble member, described below) and tidal elevations. The initial conditions for each simulation are taken from a previously computed “hindcast” that is designed to keep the dynamic model up-to-date with respect to the analyzed atmospheric model state. This is called hot-starting the model. For each synoptic cycle, a hot-start file is computed that brings the model state forward in time from the beginning of the previous cycle to the start of the current forecast cycle (Figure 2). The wind field boundary conditions for each simulation are taken from a variety of sources, each of which constitutes one member of the ensemble. In addition to the atmospheric model forecasts provided by NCEP, the SCOOP project also uses tropical storm forecast tracks from the National Hurricane Center to synthesize “analytic” wind fields. Each forecast track is statistically perturbed and an analytic vortex model[13] is used to compute the wind and pressure fields for each track. In the SCOOP project, this service is provided by the University of Florida and the wind files arrive through LDM. We are currently investigating the skill of this ensemble approach, and results will appear in a separate communication.

3.

Design Philosophy

The need for timely access to high performance resources for the large suite of ensemble runs makes it important to have a distributed, fault tolerant Grid environment for these model runs. Based on earlier experience in storm surge modeling and the lessons learned from other inter-disciplinary Grid efforts, we identified a set of higher level design principles that helped guide the architecture and implementation of the system. Scalable real-time system: As discussed earlier, using ensemble modeling the forecast accuracy can be increased. Running multiple high resolution, large-scale simulations necessitates the need for a scalable and distributed real-time system. Thus, our system is based on Grid technologies and standards allowing us to leverage access to ad-hoc resources that may become available. Extensible: While this effort has been largely focused in the context of the SCOOP ADCIRC model, our goal is to build a modular architecture to be able to support other applications and add additional resources as they become available. Adaptable: The criticality and the timeliness aspects of the science and the variability in grid environments require the infrastructure to be adaptable at various levels. The infrastructure needs to have active monitoring and adaptation components that can react to these changes and ensure successful completion of the models using fault tolerance and failure recovery techniques. Specifically, based on these underlying design principles, we are focused on building a framework that can be used for real-time storm surge ensemble modeling on the Grid that is triggered by arrival of wind data. The required timeliness of the model runs makes it important to address the following issues on the Grid: a) real-time discovery of available resources b) managing the model run on an ad-hoc set of resources c) continuous monitoring and adaptation to allow the system to be resilient to the variability in Grid environments.

4. Data and Control Flow of the NC SCOOP System The ADCIRC storm surge model can be run in two modes. The “forecast” mode is triggered by real-time data arrival of wind data from different sites through the Local Data Manager[15]. In the “hindcast” mode, the modeler can either use a portal or a shell interface to launch the jobs to investigate prior data sets (post-hurricane). Figure 3 shows the 3

architectural components and the control flow for the NC SCOOP system: 1. In the forecast run the wind data arrives at the local data manager (Step 1.F. in Figure 3). In our current setup, the system receives wind files from University of Florida and Texas A&M. Alternatively, a scientist might log into the portal and choose the date and the corresponding data to re-run a model (Step 1.H. in Figure 3). 2. In the hindcast run, the application coordinator locates relevant files using the SCOOP catalog at UAH[23] and retrieves them from the SCOOP archives located at TAMU and LSU[17]. In the forecast runs, once the wind data arrives, the application coordinator checks to see if the hotstart files are available locally or are available at the remote archive. If they are not available and not being generated currently (through a model run), a run is launched to generate the corresponding hotstart files to initialize the model for the current

resources. The application package is customized with specific properties for the application on a particular resource and includes the binary, the input files and other initialization files required for the model run. 6. The self-extracting application package is transferred to the remote resource and the job is launched using standard grid mechanisms. 7. Once the application coordinator receives the “job finished” status message, it retrieves the output files from the remote sites. 8. In case of the hindcast mode, the results are then available through the portal (Step 8.H in Figure 3). Additionally, in case of forecast mode, we push the data back through LDM (Step 8.F in Figure 3). Data is then archived and visualized by other SCOOP partners downstream. 9. The application coordinator publishes status messages at each of the above steps to a centralized messaging broker. Interested components such as

Figure 3. The control flow through the various components of the architecture forecast cycle. the portal can subscribe to relevant messages to 3. Once the model is ready to run (i.e. all the receive real-time status notification of the job run. data is available), the application coordinator will 10. In addition the resource status information use the resource selection component to select the is also collected across all the sites and can be best resource for this model run. observed through the portal as well as used for 4. The resource selection component queries more sophisticated resource selection algorithms. the status at each site and ranks the resources, 5. Technology Components accounting for queue delays and network We have described the flow through the control connectivity between the resources. system and identified the key components of the 5. The application coordinator then calls an architecture. In this section we will discuss in application specific component that prepares an greater detail the design issues, technology choices application package that can be shipped to remote and implementation of the architecture components. 4

As noted earlier, our architecture is based on existing open source grid middleware and web services tools such as Globus[2], Open Grid Computing Environment (OGCE)[4] and WSMessenger[10]. We describe each of the components in detail below. 5.1. Data Management The data transport system in SCOOP is based on Unidata’s Local Data Manager (LDM). LDM allows us to select, capture, manage, and distribute arbitrary data products over a networked set of computers. LDM is designed for event-driven data distribution where a client may ingest data. In addition an LDM server can communicate with other LDM servers to either receive or send data. LDM is flexible and allows for site-specific configuration and processing actions on the data. The ADCIRC model receives its upstream wind and meteorological data through LDM and the model results are sent downstream to other SCOOP partners through LDM for archiving and visualization. LDM allows us to associate triggers with arriving data that can be used for launching automated model runs. In the long term we anticipate that there might be multiple ways that the data might arrive. In this case, the model runs may need to be triggered by a higher level component. We also use GridFTP to manage data movement during model execution. In addition, we use the SCOOP catalog[23] to locate the data files that may have been generated previously. If available, the files are retrieved from the SCOOP archives[17]. The two types of files retrieved from the archive are the hotstart files to initialize the model run and the netCDF wind files. The wind files arrive through LDM for the forecast runs but may need to be retrieved from the archive for the hindcast runs. In addition, this gives us the ability to use the wind files from the archive to reduce data movement costs during forecast model execution. 5.2. Grid Middleware In the last few years, there has been increased deployment of Grid technologies on commodity clusters. These clusters are used to run scientific applications and are shared across different organizations forming large interdisciplinary virtual communities. For our system we assume a minimal software stack composed of existing grid technologies and protocols to manage jobs and files, namely, Grid Resource Allocation and Management (GRAM)[2] and GridFTP[6] based in the Globus

toolkit. Additionally, the Globus Monitoring Discovery System (MDS)[28] and Network Weather Service (NWS)[27] configured at a site is used to make a more informed resource selection. During the resource selection process, each of sites is queried for the queue status and the bandwidth to each site. The resource selection process is described in greater detail in §5.5. Once a resource is selected, a credential to be used at this site is obtained from a MyProxy[3] server. MyProxy server is a credential management service that stores Globus X.509 certificates. MyProxy allows users to store their certificates and private keys in the repository making it accessible from different distributed resources. MyProxy issues a short lifetime certificate to the system that can then be used to authenticate to the remote system. 5.3. Application Coordinator The Application Coordinator acts as a central component for each of the model runs whether initiated by the user through the portal or triggered by the arrival of data through LDM. It uses the resource selection component to select a grid site. After the user proxy is obtained, the Application Coordinator is able to perform Grid operations on behalf of the user (in case of the hindcast) or a preconfigured user (for the forecast). The application manager invokes a specified script to generate a self-extracting package of the application for the particular remote site. This self extracting package is transferred to the remote site using GridFTP. Once the file is transferred, the job is submitted to the Globus gatekeeper using the GRAM protocol. The GRAM protocol also allows users to poll for the status of the jobs or associate listeners that get invoked when the job status changes. Additionally, when the job completes, we use GridFTP to retrieve the compressed set of output files. The Application Coordinator has been designed to take configuration parameters about the application, its requirements and environment. This module supports running ADCIRC with different grids for different geographical regions and configurations. More recently, the module is being customized to be used with different meteorological models. 5.4. Application Preparation In this work, we assume that the need for urgent computing may necessitate situations that result in ad-hoc quick social arrangements to make resources available during a major storm or weather event. 5

This has implications on how and what we can expect a site to have installed and/or preconfigured. It is possible that the binaries may not

ADCIRC might vary slightly on different resources. ! Finally, create the compressed file

Figure 4. Job status and resource status from the Figure 5. Job History and result files from the portal portal

Figure 6. Hindcast mode from the portal be installed on the target resource. Once a resource for a particular ensemble member is selected, we need to create the application package that will be needed for the particular resource. We create a selfextracting archive file using an open source product called makeself. The self-extracting archive file contains everything that is needed for a model run and is the only file that is transferred to the selected grid resource. While in this particular work, the module contains the binary as well, it is possible to use this for applications which might be preinstalled at sites. The specific steps that are involved in creating this bundle include: ! Running a program that converts the netCDF version of the input wind file to a version compatible with the ADCIRC model. ! Select the correct set of ADCIRC executables for the given resource architecture and model run. ! Identifying specific arguments that are required at the remote end when the bundle is extracted, e.g. the actual MPI command for

containing the binary and all the input data. As described previously in §2, these model runs are usually hotstarted with previous day’s model results. The Application Preparation module checks a “correspondence” description file to identify the type of hotstart file required for a particular wind type and grid. It checks to see if this file has been generated previously and is available either locally or remotely in the archives. If the file does not already exist, it checks to see if another process is running right now that might generate it. If the file is being generated by another process it waits for the process to complete, otherwise a process is launched to generate the hotstart file. 5.5. Resource Selection The Grid sites vary greatly in performance and availability. Even with pre-established arrangements for exclusive access, resources and/or services maybe down or unavailable. Hence, given the criticality of the model run completion, we choose to use a dynamic resource selection algorithm to select an appropriate site for the job submission. 6

During the resource selection process, each of the sites is queried for the queue status and the bandwidth. Globus MDS[28] is an information service that aggregates information about resources and services that are available at a site. Network Weather Service (NWS)[27] is a sensor based distributed system that periodically monitors and dynamically forecasts performance measurements such as CPU and bandwidth. We have developed a simple plug-in based resource-ranking library. While currently we use only real-time information, the library is flexible in allowing us to collect historical information to make better and more accurate predictions. The question we try to answer in our resource selection is "Where should I run this job right now?" The library is built on top of Java CoG Kit[24] and uses the standard libraries for querying resources. The framework is completely extensible and can easily accommodate more sophisticated algorithms in the future. The resource selection first searches a list of remote resources to confirm availability in terms of appropriate authentication and authorization access to the resource, ascertain running of the basic Globus services such as GridFTP and GRAM. All remote resources meeting the above requirements are then ranked according to the real-time information including queue status and bandwidth. This allows us to balance the implications of data movement costs with computational running time. Based on queue and the bandwidth a total time estimate on each resource is calculated to rank the resources. The algorithm takes approximate running times for the model and the data sizes as input to perform this calculation. 5.6. Portal In addition to timely execution of the model, it is important to be able to share the data with the community at large while shielding the consumers of the information from the complexity of the underlying system. We use an Open Grid Computing Environment (OGCE) based portal interface to make available the status of the runs and output files from the daily forecast runs. Figure 4 shows the status of the model runs and Figure 5 shows the results available from the portal. A color marker shows the current state of the run (i.e. “data arrived”, “running”, etc.). In addition end users can use the portal to launch hindcast model execution

(Figure 6) in a grid computing environment using the files from the SCOOP archives. 5.7. Fault Tolerance and Recovery We apply a number of techniques to diagnose and repair errors that might occur during run-time, using a two-phase approach in the ADCIRC Application Manager. The first phase uses retries in the event of a failure or a timeout, and the step is retried a specified number of times. If the retries do not resolve the failure, a "persistent" error has occurred. The execution of the application coordinator has distinct phases (move files, run job, etc.). Persistent errors may occur in one of these labeled phases. A persistent error causes the decoder to retry beginning at an appropriate earlier phase. In addition, certain kinds of persistent errors, such as a failure to successfully transfer a file to a selected resource, cause that resource to be omitted from consideration during the resource selection phase of the retry. This error handling allows the complete execution of model runs under many different adverse circumstances, taking advantage of the inherent redundancy in a grid enabled environment. The application manager can easily detect errors and take appropriate rectification action. But sometimes errors might occur at the model level producing garbled data or a process might run longer than expected and not produce the output. In future implementations, we anticipate we will need additional error checking to detect these scenarios to decrease the probabilities of failures. 5.8. Monitoring and Notification A central component of our design is proactive monitoring of the status of the application and data. This monitoring system is based on standard tools and techniques such as Network Weather Service[27] and instrumentation points at various points of the data flow. The key to managing a distributed adaptation framework is a standard messaging interface. Our messaging interface is based on the workflow tracking tools and eventing system (WS-Messenger)[10] being built as part of another NSF ITR project - LEAD (Linked Environments for Atmospheric Discovery)[26]. Every component in our system publishes status information such as “input data arrived”, “task started”, “task finished”, etc. This status information is available through the portal interface (Figure 5). In addition, the resource monitoring portlet reads a web service we created that serves 7

CPU availability and network bandwidth data. The data itself is currently collected using MDS, NWS and the LEAD eventing system and then stored in a MySQL database.

6.

tolerance that was needed in the Application Coordinator to recover from various errors that might occur during execution. This was then built into the more recent version of the Coordinator. We are currently planning on wrapping these capabilities as web services allowing for more wide spread use in the Grid framework and workflow tools. Our resource selection algorithm is simplistic, but more generally the framework we have developed allows us to easily integrate other more sophisticated algorithms that are being researched in the Grid community.

Deployment Experiences

Various components of the framework have been tested and deployed in the context of hurricane storm surge over the past two years. In this section we briefly describe our evaluation and experiences. 6.1. Resource Pool Management The following SCOOP partner and SURAGrid sites have been tested and added to the resource pool for ADCIRC - local resources at Renaissance Computing Institute (RENCI), Texas A&M University (TAMU), University of Florida(UFL), University of Alabama at Huntsville(UAH), and University of Louisiana Lafayette (ULL). Each of the sites run basic Globus grid services such as the Gatekeeper for job submission, GridFTP for file transfer, and an information service and Network Weather Service. Our current infrastructure is based on the pre-web service protocol stack available in Globus versions 2.x through 4.x. It is important that the basic Globus services are configured correctly at all sites that might be used for the model runs. We have a test suite that is used to test all sites to verify the basic services are running and configured correctly. The test suite verifies the access rights, firewall, configuration of Globus services and the batch scheduler that might configured at the site. The sites are tested periodically to verify correct operation. The test suite helps detect, diagnose errors more proactively. To easily add resources to the pool, we use configuration properties. This allows us to add other resources to the pool, without any programmatic changes. The properties include the addresses for the Globus services, firewall port information and security credentials that can be used for a resource. 6.2. Application Coordinator The application coordinator is configured using a property file allowing easy addition of model configuration parameters etc. An application can use the framework by supplying application specific properties and scripts for creating the packaging, etc. As mentioned the framework is being applied to the North Carolina Forecasting System to run ADCIRC with different grids and other meteorological models. Our early experiences showed the need for higher resilience and fault

7.

Related Work

Grid computing has been increasingly used to run scientific applications from different domains including earthquake engineering, bioinformatics, astronomy, meteorology, etc. Our framework specifically addresses the problems of the need of increased reliability and fault tolerance and recovery that is needed in the context of time sensitive application such as storm surge prediction. Grid scheduling and adaptation techniques have been based on evaluating system and application performance are used to make scheduling and/or rescheduling decisions. Heuristic techniques are often used to qualitative select and map tasks to available resource pools[25]. Our resource selection algorithm is fairly simplistic and only considers queue status and bandwidth measurements to make a decision. While this is simplistic, it works effectively in our current resource environment. The API has been designed to be flexible to allow easy addition of other more sophisticated algorithms in the future.

8.

Conclusions and Future Work

This framework provides a solid foundation on which to build a highly reliable Grid environment for applications that might be time sensitive and/or critical. An enhancement to the computational system currently being developed is the selection of the ADCIRC model grid based on the predicted storm landfall location. We envision a suite of ADCIRC domains with the same basic open-ocean detail, but with different grids resolving different parts of the coastal region and supporting flooding and surge inundation. Grid and portal standards have been a moving target for a few years now. Our software stack for this work was guided by state of the art at the time of the project inception. More recently technology 8

implementations of the standard (i.e. JSR 168) and grid standards (WSRF) have stabilized and we will transition to support Globus 4.0 web services and OGCE-2 for our portlets. Our experiences with building and deploying the framework emphasize the need for increased fault tolerance and recovery techniques to be implemented in real Grid environments. We are investigating standardized web services interfaces that will allow applications to be easily run in a Grid environment with capabilities such as resource selection and fault tolerance. In addition, userfriendly modules that allow scientists to specify the properties needed by the Application Coordinator are being investigated. Data collected from the operation of the framework during the hurricane season will drive further evolution of the framework.

9.

10. References 1. I. Foster, C. Kesselman and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of Supercomputer Applications, 15(3), 2001. 2. I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit,” International Journal of Supercomputer Applications, 11(2):115-128, 1997. 3. J. Novotny, S. Tuecke and V. Welch, “An Online Credential Repository for the Grid: MyProxy,” Proceedings of the Tenth International Symposium on High Performance Distributed Computing (HPDC-10), August 2001. 4. Open Grid Computing Environment (http://www.collab-ogce.org/nmi/index.jsp) 5. M. Russell, G. Allen, I. Foster, E. Seidel, J. Novotny, J. Shalf, G. von Laszewski and G. Daues, “The Astrophysics Simulation Collaboratory: A Science Portal Enabling Community Software Development,” Proceedings of the Tenth International Symposium on High Performance Distributed Computing (HPDC-10), pp. 207-215, 2001. 6. W. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnal and S. Tuecke, “Data Management and Transfer in High Performance Computational Grid Environments,” Parallel Computing, 28 (5), pp. 749-771, May 2002 7. I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, “A Security Architecture for Computational Grids,” Fifth ACM Conference on Computer and Communications Security, pp. 83-92, 1998. 8. L. Pearlman, C. Kesselman, S. Gullapalli, B.F. Spencer Jr., J. Futrelle, K. Ricker, I. Foster, P. Hubbard and C. Severance, “Distributed Hybrid Earthquake Engineering Experiments: Experiences with a Ground Shaking Grid Application,” NEESGrid Technical Report-2004-42, 2004. 9. Climate of 2005: Atlantic Hurricane Season. http://www.ncdc.noaa.gov/oa/climate/research/2005 /hurricanes05.html, 2006. 10. Y. Huang, A. Slominski, C. Herath, and D. Gannon, "WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing ," 6th IEEE International Symposium on Cluster Computing and the Grid (CCGrid06).

Acknowledgements

This study was carried out as a component of the “SURA Coastal Ocean Observing and Prediction (SCOOP) Program”, an initiative of the Southeastern Universities Research Association (SURA). Funding support for SCOOP has been provided by the Office of Naval Research, Award N00014-04-1-0721 and by the National Oceanic and Atmospheric Administration’s NOAA Ocean Service, Award NA04NOS4730254. We would also like to thank the various SCOOP partners for discussion on the use cases - Philip Bogden (SURA and GoMOOS); Will Perrie, Bash Toulany (BIO); Charlton Purvis, Eric Bridger (GoMOOS); Greg Stone, Gabrielle Allen, Jon MacLaren, Bret Estrada, Chirag Dekate (LSU, Center for Computation and Technology); Gerald Creager, Larry Flournoy, Wei Zhao, Donna Cote and Matt Howard (TAMU); Sara Graves, Helen Conover, Ken Keiser, Matt Smith, and Marilyn Drewry (UAH); Peter Sheng, Justin Davis, Renato Figueiredo, and Vladimir Paramygin (UFL); Harry Wang, Jian Shen and David Forrest (VIMS); Hans Graber, Neil Williams and Geoff Samuels (UMiami); and Mary Fran Yafchak, Don Riley, Don Wright and Joanne Bintz (SURA). We would like to thank various SCOOP and SURAGrid partners for making resources available and special thanks to Steven Johnson (TAMU), Renato J. Figueiredo (UFL), Michael McEniry (UAH), Ian Chang-Yen (ULL), and Brad Viviano (RENCI), for providing valuable system administrator support.

9

11. E. Kalnay, “Atmospheric Modeling, Data Assimilation and Predictability,” Cambridge University Press, 2003. 12. R.A. Luettich, J. J. Westerink, and N. W. Scheffner, ADCIRC: An advanced threedimensional circulation model for shelves, coasts and estuaries; Report 1: theory and methodology of ADCIRC- 2DDI and ADCIRC-3DL, Technical Report DRP-92-6, Coastal Engineering Research Center, U.S. Army Engineer Waterways Experiment Station, Vicksburg, MS, 1992. 13. G. Holland. An Analytic Model of the Wind and Pressure Profiles in Hurricanes. Monthly Weather Review, Vol. 108, No. 8, pp. 1212–1218, 1980. 14. J. Sivillo, J. Ahlquist, and Z. Toth. An Ensemble Forecasting Primer, Weather and Forecasting, Vol. 12, pp. 809-818, 1997. 15. Unidata Local Data Manager. http://www.unidata.ucar.edu/software/ldm/, 2006 16. P. Bogden, et al, The Southeastern University Research Association Coastal Ocean Observing and Prediction Program: Integrating Marine Science and Information Technology," Proceedings of the OCEANS 2005 MTS/IEEE Conference. Sept. 18-23, 2005, Washington, D.C. 17. D. Huang, G. Allen, C. Dekate, H. Kaiser, Z. Lei and J. MacLaren "getdata: A Grid Enabled Data Client for Coastal Modeling," Published in HPC06. 18. P. Bogden et al., "The SURA Coastal Ocean Observing and Prediction Program (SCOOP) Service-Oriented Architecture," Proceedings of MTS/IEEE 06 Conference in Boston, September 1821, 2006 Boston, MA, Session 3.4 on Ocean Observing Systems. 19. J. Bintz et al., "SCOOP: Enabling a Network of Ocean Observations for Mitigating Coastal Hazards," Proceedings of the Coastal Society 20th International Conference, May 14-17, 2006; St. Pete Beach, FL. 20. SCOOP Website http://scoop.sura.org/, 2006. 21. D. A. Reed, et al., "Building the Bioscience Gateway," Global Grid Forum Technical Paper, June 2005. 22. North Carolina Forecasting System. http://www.renci.org/projects/indexdr.php 23. S. Graves, K. Keiser, H. Conover, M. Smith. “Enabling Coastal Research and Management with Advanced Information

Technology,” 17th Federation Assembly Virtual Poster Session, July 2006. 24. G. von Laszewski, I. Foster, J. Gawor, and P. Lane, "A Java Commodity Grid Kit," Concurrency and Computation: Practice and Experience, vol. 13, no. 8-9, pp. 643-662, 2001, http://www.cogkit.org/. 25. D. Angulo, R. Aydt, F. Berman, A. Chien, K. Cooper,H. Dail, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, M. Mazina, J. Mellor-Crummey, D. Reed, O. Sievert, L. Torczon, S. Vadhiyar, and R. Wolski. Toward a framework for preparing and executing adaptive grid programs. In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), 2002(41). 26. K. K. Droegemeier et al, “Service-Oriented Environments In Research And Education For Dynamically Interacting With Mesoscale Weather,” IEEE Computing in Science and Engineering, November-December 2005. 27. R. Wolski, N.T. Spring, J. Hayes, “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” Future Generation Computer Systems, 1998. 28. K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman, “Grid Information Services for Distributed Resource Sharing,” Proceedings of the Tenth IEEE International Symposium on HighPerformance Distributed Computing (HPDC-10), IEEE Press, August 2001.

10

Science Gateways on the TeraGrid A survey of issues for deployment of community gateway interfaces to shared high-end computing resources Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton, Kevin J. Price, Anurag Shankar, Von Welch, Nancy Wilkins-Diehr November 2006

Abstract Increasingly, the scientific community has been using web portals and desktop applications to organize their work. The TeraGrid team determined that it would be important to create a set of capabilities that would allow TeraGrid services and resources to be integrated, potentially in a transparent way, with these scientific computing environments. This paper outlines the “Science Gateways” program and provides an overview of key lessons learned in developing mechanisms to allow for such integration. Contents Background.......................................................................................................................... 2 Accounting........................................................................................................................... 3 Security ................................................................................................................................ 5 Risk Mitigation................................................................................................................ 5 Federated Identity Management ...................................................................................... 6 Metrics and Successful Peer Review ................................................................................... 6 Conclusions and Further Work: Science Gateway Primer................................................... 7 References ............................................................................................................................8

1 Background In 2004, the National Science Foundations’s TeraGrid facility [i] was made available to the national academic community after a 2-year period of construction and early access. The initial facility consisted of homogeneous environment of four Itanium-1 based clusters deployed at four highperformance computing sites linked by a dedicated 40GB/s network, with an aggregate of roughly 15 Teraflops of computational power. Today the facility includes over 20 computational resources, of a wide variety of architectures, at nine sites. In aggregate, TeraGrid provides over 140 Teraflops of computational power and will grow to over 560 Teraflops in 2007. Beyond computational resources, TeraGrid includes high-performance data archives, a growing number of public data collections, and a suite of remote visualization services. Initially, TeraGrid use consisted primarily of traditional client-server interactions, with grid functionality such as single sign-on, parallel data transfer, and remote job submission as well as a new allocation scheme that allowed users to obtain allocations “redeemable” on any TeraGrid platform rather than tied to a particular system as in the past. In 2006 the TeraGrid software architecture was augmented with web services, moving to a service oriented architecture to support new usage paradigms such as workflow and access from within applications or web portals.

Concurrent with the TeraGrid effort, many communities have developed customized interfaces to cyberinfrastructure – using emerging technologies such as web portal platforms and web services. Web access to data collections and compute capabilities are increasingly commonplace, though there are a variety of approaches and technologies used within the community, with a resulting variety of architectures and approaches to building such ‘science gateways.’ For example, the Protein Data Bank [ii] provides an electronic collection of structures of biological macromolecules as well as tools and resources for studying these structures. Similarly, the Network for Earthquake Engineering Simulation Cyberinfrastructure Center [iii] serves those engaged in earthquake engineering research by providing data sharing and simulation capabilities. The Network for Computational Nanotechnology [iv] provides course material, collaboration and simulation capabilities and more to those studying nanotechnology. The National Virtual Observatory [v]provides access to very large digital sky surveys and includes analysis capabilities. There are many such examples. In 2005 the TeraGrid team initiated a “Science Gateways program” to broaden the availability of TeraGrid resources and services by providing access through community-designed interfaces. Through this program we are effectively

adapting the TeraGrid to the work environment chosen by the user rather than requiring the user to learn and adapt to the TeraGrid environment. Working with developers of community infrastructure, we couple powerful TeraGrid resources to the “back end” of familiar front-end interfaces developed by research communities. By providing a rich set of services to gateway developers, they are able to provide greater capabilities to individual scientists without requiring them to invest in learning how to adapt their work to the TeraGrid environment. Traditionally, access to highperformance computing resources is granted to individuals, each of whom is provided with their own username and password. These individuals generally log on to a particular computing platform and submit compute jobs to a queuing system. Providing access to high end resources for a community through a shared interface involves a different, indirect, relationship between the resource provider and the end-user, and this impacts many aspects of the system ranging from authorization to usage accounting. Developing mechanisms to support this new model requires adaptation both for the resource providers and for the community interface developers. In this overview we outline some of these issues. Perhaps most importantly, the Science Gateway model provides an effective and efficient mechanism for providing specialized services (such as highperformance computing) to much larger user communities than was possible in

the past. Individual community proxies or organizations can aggregate resources for the larger group while identifying the common community services that will deliver the highest impact. This leads to a collaborative endeavor between those who provide general cyberinfrastructure services (such as TeraGrid), those who provide discipline-specific cyberinfrastructure (such as a science gateway team), and the scientists themselves. Such interactions are critical to developing and evolving a national cyberinfrastructure that is able to improve the productivity of individual scientists. We will discuss several areas that the TeraGrid has found critical to supporting gateway interfaces to shared resources.

2 Accounting Some gateways plan to support tens of thousands of users. Creating individual logins for all of these users on all TeraGrid resources presents significant scaling challenges. One simplification that can streamline access to resources is a shared community account. Shared community accounts are not accounts where many users access resources by entering the same username and password. Rather, the accounts are managed by the gateway developer and used to run codes on behalf ot the scientist through the gateway interface. Individual users may register with a domain-specific front end, and access common community computational services that are provided

via the TeraGrid using a single gateway account/username. Each individual using the gateway need not have an individual account on the TeraGrid; the gateway provides a collective account. While this level of anonymity can simplify access for end users, there are a number of additional tools required to provide the accountability and security necessary for NSF-funded resources. Developers need additional tools to trace community account resource usage back to individual gateway users and security staff need additional tools to restrict logins that provide web interfaces to supercomputing resources. We will describe these as well as additional capabilities necessary for developers in a complex multi-site environment. In a shared environment, gateway developers will need to track use of TeraGrid resources and attribute this use to individuals logged on to the gateway. In order to correctly attribute usage to an individual gateway user, a developer must be able to determine how many CPU hours were consumed by a job launched by that user on the TeraGrid. While seemingly straightforward in nature, we found that additional capabilities were necessary to facilitate this tracking. Many gateway developers use capabilities provided by the Globus Toolkit [vi]to access TeraGrid resources. For example, Grid Resource Allocation and Management (GRAM) [vii] is typically used to support remote job submission and monitoring., However, when a job finishes there is no

straightforward way for the gateway to determine how many CPU hours the job consumed. That information is critical to attributing usage to individual users using a Science Gateway account on the TeraGrid. To enable this functionality, the Globus team defined and created a Web Services (WS) interface and associated mechanisms to provide access to audit and accounting information associated with Grid services. This auditing system was designed to be scalable, secure, and open so that any grid service that TeraGrid deploys can follow this design. First, the Globus Toolkit’s GRAM2 (PreWS GRAM) and GRAM4 (WS-GRAM) services were enhanced to create audit records that are written to a database local to the GRAM services. So there will be many audit databases created everywhere TeraGrid’s grid services are deployed. These GRAM audit databases and records provide a persistent link between the grid service’s job id and the local resource manager’s (LRM) job id. Next, based on use cases and requirements, Open Grid Services Architecture-Data Access and Integration (OGSA-DAI) was selected to provide a service interface for TeraGrid’s audit and accounting information. OGSA-DAI is a Globus Toolkit Web Services Resource Framework (WSRF) service that can create a single virtual database from two or more remote databases. A new OGSA-DAI perform document was written which defines the WS operation for returning TeraGrid Service Units (SUs) given a grid job id.

Gateway developers can then use this new interface to manage and account for their allocation on a per job basis. There are some tricky details to accurately retrieving the correct usage for a job from TeraGrid’s central database that are best codified in a service/configuration and not exposed to gateway developers. OGSA-DAI can be deployed either centrally providing a TeraGrid-wide usage query service for all jobs, or deployed locally along with each GRAM4 service deployment, providing a local usage query service for local jobs. TeraGrid can decide (and change) which is most strategic. Gateways will now be able to remotely submit jobs to TeraGrid and account for usage on a per job basis without needing to understand the details of the various local resource managers chosen by TeraGrid resource providers. We feel this accounting capability will be very useful for other projects where per-job usage information is needed. These types of enhancements are essential toward reducing the complexity for gateways to interface with TeraGrid’s computational resources, as well as, allowing TeraGrid to simultaneously support an increasing number of gateways.

3 Security 3.1 Risk Mitigation Additional risks can also arise when providing community account and web interfaces to high performance resources. The TeraGrid security

working group has analyzed these risks and is developing approaches to mitigate them. Security officers at each site are alerted when a community accounts is requested and these accounts are uniquely identified in the TeraGrid central database. TeraGrid resource sites may take independent approaches to account restriction. The current approach being suggested is a command-based restriction approach where a community account may only run certain commands, e.g. only commands in a specific directory. At least one TeraGrid site is using this approach. We believe it provides the necessary security and gateway flexibility when deploying many applications on TeraGrid. One software package currently being developed at NCSA to suit this purpose is the Community Shell, or Commsh [ix]. Commsh allows for two methods of account restriction: The first method is an implementation of the commandbased restriction described in the previous paragraph. Under this method, a configuration file is created that defines which commands (or sets of commands) a given account can execute. These commands can be specified using wildcards and regular expressions to create for a flexible command restriction framework. The second method is change-root (or chroot) jailing. Change-root jailing effectively creates a filesystem-based "sandbox" for the account, only allowing commands to be executed from within this sandbox. An additional utility,

Chroot_jail, can be used to help construct and manage these change-root jails. An adapter exists that can allow Commsh to operate in command-based restriction mode for GRAM job submissions. Unfortunately, this adapter does not support change-root jailing at this time. Gateways may also want to implement shut-off mechanisms so that in the event of a problem, jobs sent to TeraGrid can be restricted from a given gateway user without shutting down the entire community account. As an intermediate step, TeraGrid will be providing a web interface for system administrators to contact developers with information regarding a problem local jobid. Long term, TeraGrid would like to provide a service to enable automatic gatewaylevel account shut-off.

3.2 Federated Identity Management In the traditional mode of operation, prior to Science Gateway model, each resource or resource-providing site was responsible for the management of their users identities (a term we use to encompass a user’s username, password, attributes and privileges). The science gateway model brought an out-sourcing of identity management from the resource to the gateway, shifting the responsibility for authentication and authorization from the resource to the gateway [xi].

To achieve the maximum scalability, the goal is to shift identity management all the way back to the user’s home institution (i.e. their campus or place of employment) and leverage the existing identity management infrastructure. Mechanisms to achieve this based on Shibboleth, GridShib, myVocs are other technologies are currently being evaluated by TeraGrid[x].

4 Metrics and Successful Peer Review Metrics of success are commonly requested for government funded programs. Successful gateway design will allow principal investigators to highlight gateway usage as well as science accomplishments due to the gateway. In the long term, gateways may set up a mechanism for researchers to cite the use of the gateway in publications. Success both in funding the gateway and in requesting TeraGrid resources can be traced to scientific accomplishments and a history of publications. For example, the DOE-sponsored Earth System Grid (ESG) project [viii] includes a Metrics Service that tracks logins, file and aggregation downloads, browse and search requests, and the total volume of activity conducted via its portal. Similar to many other funded projects, this information is very useful to principal investigators and sponsors in terms of determining the overall impact of the project. As ESG begins to utilize TeraGrid resources, it will need to track

computational and data services that are delivered to it as a Science Gateway.

necessary for making use of those resources.

At the same time, we need to understand the degree to which the TeraGrid contributes to the success of a given project. In the large, this is a fairly straightforward thing to quantify, but there will need to be an interplay between the Science Gateway projects and the TeraGrid where utilization of resources are associated with impact on the science community.

To facilitate accurate and timely information in such a dynamic area, the primer has been made available as a Wiki (http://www.teragridforum.org/ mediawiki) and will serve as the basis for Science Gateway documentation for TeraGrid resource integration. We expect contributions from a number of active gateway developers to ensure both accurate and timely information on TeraGrid resources and services, and also see the Wiki as an active repository for the many tools used in gateway development. We believe this type of community involvement will provide a rich collection of information for others and will facilitate use of high end resources in an increasing number of gateways.

5 Conclusions and Further Work: Science Gateway Primer Gateway and portal deployment is an extremely active area, and one objective of the TeraGrid Science Gateway program is to develop standard processes and approaches necessary so that gateways can be enabled in a routine fashion. We have pursued this approach in such a way as to minimize TeraGridspecific requirements for gateways, thus making it straightforward for a gateway project to use resources from TeraGrid as well as other grid facilities. To this end, an important aspect of the program has involved capturing and codifying lessons and approaches in the form of a “primer.” The initial version of the primer, based on initial experiences integrating TeraGrid resources with gateways, describes available resources and services for gateways as well as requirements

The primer describes TeraGrid resources and services available to Science Gateways, requirements for using TeraGrid resources, best practices when designing a gateway and a includes a software contribution area. TeraGrid provides a variety of services to gateway developers in addition to hardware and software resources. Developers also have additional responsibilities for securing community accounts and tracking usage. Accounting services include a variety of accounts made available to gateway developers including single user accounts, community accounts and, in the future, dynamic accounts. The primer includes links to all computing, data and visualization resources

available through TeraGrid. It describes the types of software available – the Common TeraGrid Software Stack, third party packages installed on the TeraGrid and maintained by TeraGrid staff and community software areas available to gateway developers for their own software deployment efforts. In addition, TeraGrid external relations staff are available to assist in publicizing gateway successes. Requirements outlined in the primer include additional information to be provided when requesting community accounts, recommended audit trails for usage tracking and mechanisms to restrict problem jobs. Best practices described cover gateway planning,

6 References

design, implementation, operation and metrics collection as well as desirable gateway characteristics. The goal of the TeraGrid Science Gateway program is to provide streamlined access to developers wishing to integrate high end resources into their portals and desktop applications. The Science Gateway team will continue to develop processes and software functionality necessary to make this possible. Near term future work will address generalized Web Service interfaces to TeraGrid resources and an attribute-based authentication testbed to investigate scaling and accounting issues faced by large communities.

[i] More information about TeraGrid can be found at http://www.teragrid.org [ii] Protein Data Bank – http://www.pdb.org [iii]Network for Earthquake Engineering Simulation – http://it.nees.org [iv] Nanohub – http://www.nanohub.org [v] National Virtual Observatory – http:// www.us-nvo.org [vi] Globus Toolkit – http://www.globus.org [vii]Grid Resource Allocation and Management - http://www.globus.org/alliance/ publications/papers.php#GRAM97 [viii]Earth System Grid – http://www.earthsystemgrid.org [ix] http://security.ncsa.uiuc.edu/research/commaccts/ [x] Von Welch, Ian Foster, Tom Scavo, Frank Siebenlist, Charlie Catlett. Scaling TeraGrid Access: A Roadmap for Attribute-based Authorization for a Large Cyberinfrastructure (draft) http://gridshib.globus.org/tg-paper.html [xi] Von Welch, Jim Barlow, James Basney, Doru Marcusiu, and Nancy Wilkins-Diehr. A AAAA model to support science gateways with community accounts. In Concurrency and Computation: Practice and Experience, October 2006.

CoaxSim Grid: Building an Application Portal for a CFD Model Byoung-Do Kim1, Nam-gyu Kim2, Jung-hyun Cho3, Eun-kyung Kim3, Joshi Fullop1 1

National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign 2 Korea Institute of Science and Technology Information, Tae-jun, Korea 3 Sookmyung Women’s University, Seoul, Korea

Abstract An application portal has been developed to support the execution of a computational fluid dynamics (CFD) numerical model that simulates hydrodynamic flow instability of the coaxial injector in the liquid rocket engine. The CFD application has been integrated into a web-based portal framework so that users can utilize resources available in grid computing environment. The portal provides users with a single sign-on access point that connects grid computing resources available in the users virtual organization. The portal developed in this project is built in the framework of GridSphere and Grid Portlet package. In addition to the default services offered by the GridSphere, extra service modules have been developed in order to provide customized features for the specific needs of the numerical model. The portal development was conducted as an international collaborative research between NCSA and Korea Institute of Science and Technology Information (KISTI). effective measure for developing a good problem solving environment (PSE) for the grid computing application. The web-based portal as a science gateway could be the most appropriate approach for any existing scientific application to take advantage of available grid technology. The Web browser-based portal user interfaces is a great candidate for a single sign-on access point that provides access to heterogeneous, geographically distributed resources, services, applications, and tools. By using the interface, the grid-enabled portal can offer various grid solutions to users as long as the user has access to the Internet. Integrating various grid toolkits into the web-based portal framework has been proven to be an effective mechanism for scientific research communities to use the grid resources in the form of science gateway [1, 2]. With a grid portal framework and proper toolkits, it is possible to develop a scientific portal in a shorter time period than before. The main effort is in developing customized services for the numerical application that is implemented in to the portal. Since each

1. Introduction Three important characteristics of a grid computing environment are scalability, accessibility and portability. Scalability represents the ability to offer a diverse set of resources including high-performance computers (HPC), large storage systems, visualization systems, and even special experimental instruments for certain research communities. In order to take advantage of these various resources, the computational application itself needs to be scalable up to a certain level that passes normal utilization of local resources. The accessibility of the grid environment could be realized by software layer. The main topic these days in cyberenvironment research would be how to provide better solution for scientists and engineers to access and utilize the grid resources for their research. Another important factor is portability, which guarantees easy of transformation of developed solution between heterogeneous computing environments. Achieving three features mentioned above could serve as an

1

the collaboration. The TeraGrid community was assumed to be a virtual organization for this portal project. This paper presents an overview of the application portal as well as details of the newly developed service modules for the CFD application.

application has different requirements, developing customized modular services and plug them into the basic portal frame work would be the best approach as it gives the portal high level of flexibility and portability. In this year, NCSA has initiated an international collaborative research project with KISTI in the area of cyberenvironment; developing problem solving environment for scientific applications. KISTI has been putting great effort into cyberenvironent reasearch with the national e-Sceince project for the last couple of years, and one of the outcomes is e-Science Aerospace Integrated Research Systems (eAIRS) [3]. The eAIRS is an application portal that offers specialized services for a CFD application developed by Seoul National University, Korea. The eAIRS became a target item for the collaboration project because of its similar characteristics to the requirements to be developed for the NCSA’s application. While the eAIRS successfully operates for its own CFD solver, its limited portability and other technical issues led the NCSA-KISTI development team to create a new portal framework. The most noticeable difference here is that eAIRS uses its own middleware and portal framework based on GT2 while the application portal developed in this project utilizes GridSphere and Grid Portlet to ensure portability. Though the mechanism underneath the portal interface is different from each other, both portals still share some of service modules, which is a great aspect of

2. The CFD Application The application implemented into the portal is a computational fluid dynamics (CFD) numerical model for the coaxial injector flow in the liquid rocket engine. This model has been developed through collaboration research work between NCSA and School of Aeronautics and Astronautics Engineering at Purdue University, West Lafayette, IN. The purpose of this numerical model is to investigate hydrodynamic instability of the two-phase jet flow (gas and liquid) in the coaxial type of the injector that is normally used in the Space Shuttle Main Engine (SSME) and other large scale spacelaunching rockets. In the fuel injection systems of the liquid rocket engine, gaseous fuel and liquid oxidizer are injected through the nozzle into the combustion chamber of the rocket engine. Due to many disturbance factors such as acoustic pressure variation from the chamber or intrinsic hydrodynamic instability in the flow itself, the atomization mechanism of the fuel injection promotes combustion instability during rocket engine operation.

Figure 1. Cross-sectional view of liquid jet density contour in a recessed region of the coaxial injector and frequency analysis of the jet pulsation.

2

and velocities in three-dimensional Cartesian coordinate with respect to time and space. The model is also capable of check-pointing, allowing users to check for numerical divergence or other numerical problems. Typical runs of the application takes 3 to 5 days with 16 to 32 processors depending on simulation requirements. The data size of the output also varies, but typical execution produces 10 to 20 GB for each run. The application portal that we have developed integrates the numerical model into the GridSphere portal framework and enables users to control all the features mentioned above. In addition to the basic functions that are provided by the GridSphere and Grid Portlets, extra service modules for the specific needs of the model have been developed by means of portlet programming. As the portal puts complex service layers and resources behind the webbased interface, an experienced user can run the code on various resources available in his/her virtual organization. The next section will discuss the architecture of the application portal.

The numerical model in this paper is to simulate the coaxial jet flow and to investigate the atomization mechanism of the coaxial jet in order to obtain a better understanding of combustion phenomena in liquid rockets [4, 5, 6]. Figure 1 shows flow instability inside the injector nozzle and jet pulsation at the exit. The numerical model consists of three major parts; pre-processing, main solver, and post-processing, which is a common structure for engineering numerical models. Mesh generation, input parameter setup, and code-run configuration compose the preprocessing step. The main solver is the core of the model, and solves flow physics in the computational domain. The solver produces a large amount of data in multiple formats. This data get visualized in the postprocessing step. The workflow described here is widely employed procedures especially in CFD numerical models. Figure 2 shows the workflow explained in this section. Due to the short time length for the collaboration project, the portal development focused on integrating the CFD solver and automating post-processes. The pre-process procedure will be added on to the current product later on. The code is parallelized using Message Passing Interface (MPI), and runs on a large number of processors. It is also possible to control the size and frequency of the output data files as well as the number of parameters to be printed. For production runs, it normally produces data for density, pressure,

3. Portal Architecture The application portal employs a standard three-tired architecture; resource layer, grid service layer, and portal interface. The architecture consists of several major elements such as a standardized portlet container based on the JSR-168 standard,

Figure 2. Workflow diagram of the coaxial injector flow modeling application

3

Figure 3. Three-tired architecture of the CoaxSim Grid Portal

grid security, file transfer and remote execution based on the Globus toolkit. The GridSphere framework with the Grid Portlet also employs the three-tired architecture and provides the basic functionality mentioned above. The application portal developed in this project is based on the GridSphere framework 2.01 and Grid Portlet 1.3.0. The GridSphere, a portlet container and a collection of core services and advanced user interface library, makes developing a portal easier by employing a portlet programming approach [7, 8]. Pre-WS Globus Toolkit 4 serves as base middleware in this architecture while it still maintains the GT2 compatibility. Within the GridSphere framework, a series of customized service portlets have been developed for the specific needs of the coaxial injector application. In addition to the new services, some of the basic features have been improved as well in order to increase the efficiency and ease of use. Figure 3 shows a diagram of the application portal architecture. The middle section illustrates relations between grid services and 3rd party service modules that are required to operate the services. Further explanation on service development and it mechanism will be given in the next sections.

Figure 4 shows front page of the CoaxSim Grid portal that has a brief explanation about the portal development project. The basic features such as user account management, grid credential retrieval, resource registration, file management, and job submission are provided by the GridSphere. Once a user logs into the portal, the first service tab is ‘Profile’ that includes account and service management. In the next tab ‘Grid Services’, the user has a choice of proxy server for the credential retrieval; myproxy.ncsa.uiuc.edu or myproxy.teragrid.org can be used as long as the resources are registered to the portal. The time length of the credential activation can be configured though the credential management portlet. If the user is an administrator of the portal, then he or she can register any resources available in the virtual organization. The resource management page under the grid service tab provides a feature that users can examine those registered resources. The MDS 2 (Monitoring and Discovery System) service offers information regarding available services and jobs on the machine, however, the MDS 2 is not at production level and the information offered is very limited at this point. Once MDS 4 is deployed onto the TeraGrid systems, it is expected to provide more useful information of the resources. Another basic feature in the grid service tab is file management. It allows users to upload

4. CoaxSim Grid Portal

4

injector model have been implemented. First, the ‘Real Time Code Check’ page provides users with a function that displays calculated results of critical parameters in the middle of job execution. Once the user selects a job from the list, the page displays plots of residual value from the matrix solver and total computation/communication time up to that point in the code execution. This feature is extremely useful to the user when the code runs over a long period of time since the residual value provides users with information on numerical behavior of the code. The time plot also gives the user an idea of how long the code will run, and also how far the code has progressed. The second function is data management. While the GridSphere provides a basic file management portlet, it still requires manual labor from the user if he/she wants to move the output files to a local workstation. The data management function developed for this portal offers an automatic file transfer capability between the computational resources and long-term storage unit (Mass Storage System at NCSA in this case).

and download files and move data around between local workstation and the grid resources. The user interface for the file management service is mediocre at best, but the effort of developing another file management module was not feasible at the given time frame. The next tab, named ‘Simulations’, handles job submission and status check-up for the jobs that have been submitted through the portal. Jobs are submitted by Globus Resource Allocation Manager (GRAM) service, and users are asked to input information required for running the job. Once the job is submitted by GRAM, users can review the job status in the job monitoring page. The GRAM service simply shows submitted, pending, active, and done status. In the CoaxSim Grid portal, an extra feature displays detailed job status information from the local scheduler. The implementation of the feature will be explained in the next section. The functions explained so far in this section are mainly the given features by GridSphere or modified version of them. In the ‘Data’ tab, newly developed features that are specifically customized for the coaxial

Figure 4. Front page of the CoaxSim Grid Portal

5

DataGrid project. At start of a session, users store their long-lived credentials in a dedicated MyProxy server and delegate short-lived credentials to their jobs. When job’s credential nears expiration, the Workload Management System retrieves a new short-lived credential from the MyProxy server on the user’s behalf and uses it to refresh the job’s credential [9]. The CoaxSim Grid portal’s credential module uses this feature in order to avoid issues due to credential expiration.

It also provides direct file download from the computational resources to a local workstation. Both single file and multiple files download are possible allowing users flexible file management. Copies of saved files are extracted from the mass storage system but the original data files remain available in the storage system after the download.

5. Service Modules Development 5.1 Grid credential retrieval module

5.2 Job submission and status check module

The credential and resource management function is provided by Grid Portlet in the Gridsphere framework. Once the MyProxy server is registered in the resources.xml by administrator, the user can create session credential using X.509 proxy certificate. Since typical run time of the application is usually longer than normal credential lifecycle, users had to reissue the credential once the original one is expired. The MyProxy team has recently developed a solution to the problem while they were working on EU

The features for job submission from GridSphere satisfy the basic needs for running this application, but the user interface was found to be not efficient enough. We have integrated the GRAM service selection page into the resource selection and job submission page, and redesigned the RSL scripting page for a better user interface.

Figure 5. Job Monitoring page with a link to CluMon and information from the PBS scheduler.

6

computational resources at a glance. It delivers information from both the scheduler and hosts to users through web interface. To convey this information to the portal user, CluMon was slightly modified to recognize a global ID request and use that key to crossreference the local job name and return the job info to the requester. The only technical bug is that the local scheduler (PBS based) currently has a bug where it is not returning all of the job information on a remote query, as opposed to a query from the local machine. Once this is fixed, the mechanism should work as designed. Figure 5 shows the job monitoring page that displays job-related information extracted from CluMon.

In the Job Monitoring page, we added a couple of new features in order to give more job-specific information to the users. When the job is submitted by the GRAM service, it is sent to the local scheduler, such as PBS, and the GRAM generates a job ID for its own use. Unfortunately, relating globally scheduled jobs to jobs scheduled to run on a computational resource with its own scheduler has been a problem since the inception of global grids. Since GRAM service only returns very simple job status message, users are not able to get any information regarding how the job is processed on the local machine. An indirect method of solving this problem that we have employed is to assign some key (a unique job ID in the RSL script in this case) to the global job at submission time, and pass that key along to the local scheduler as a job attribute. Then on the information query side of things, that key could be used to query and ascertain the local job name. CluMon, a cluster monitoring system, developed by NCSA [10] has been utilized in this mechanism. The CluMon was developed to give an overview of the current state of the

5.3 Real time code checkup module This module has been developed in order to allow users to check the behavior of the code during run time. At every time step, the code writes values of physical parameters in a single file that are critical for determining the convergence of the code. The portal can pull this file whenever users want, and display the plots that are shown in Figure 6.

Figure 6. Real Time Code Check-up page with residual value plot and computational time per time step plot.

7

out of the box. However, users of a specific numerical model usually want to have customized services for the output file management because data management for the post processing requires specific procedures that are suitable for an automated process. The coaxial injector model produces hundreds of output files from each run, and the size of the data easily goes beyond of normal size of disk quota of users. The CoaxSim portal utilizes 3rd party file transfer mechanism that was developed with KISTI’s eAIRS project. Once the eAIRS file transfer module using gridFTP protocol is activated, then the eAIRS server keeps remotely checking the data production from the CFD solver on the computational machine. The files are transferred to long-term storage units that are already registered to the portal. At the same time, meta data for the transferred files are saved at separate DB server. When portal server inquires about the information on the output file on behalf of the user, the DB server parses corresponding meta data to the portal server for display.

The right plot is matrix residual value with respect to time and the left plot is showing computational or communication time with respect to accumulated time or each time step. This graph generation is done by utilizing freely available java graphic API, JFreeChart. This module was previously developed for the e-AIRS project at KISTI, and was implemented here with a slight modification. The displaying parameters can always be replaced with other choices depending on the users requirements. This service is particularly useful to the most of numerical application developers especially where the application has a check-pointing capability because it shows numerical behavior of the code in real time during excution. 5.4 Post-processing data management module The GridSphere provides a default file management portlet. Once the user download and install the GridSphere and Portlet package, the user can use the feature

Figure 7. Data Management page for output file browsing and single/multiple file download

8

technology and lots of programming work for services development. Keeping up with the rapidly changing the standards in the grid computing technology is another barrier for the engineering and science research groups. The CoaxSim Grid portal in this project is an example of overcoming those obstacles through collaboration between computer science group and engineering application group. Though there are still many areas to work on in order to create more sophisticate grid computing solution for generic scientific applications, the CoaxSim Grid project has shown a possible way of reusable grid services development using currently available grid technologies.

The actual file transfer from the storage system to the local workstation occurs only when the user decides to download the file to the local workstation. This mechanism saves time and avoids network performance bottlenecks since only the meta data are needed before the actual file transfer. The eAIRS server enables the 3rd party file transfer activity between the computational resource and storage system and also between the storage systems to local workstation by using gridFTP protocol. 6. Summary and Discussion CoaxSim Grid, an application portal for a CFD numerical solver has been developed through the collaborative project between NCSA and KISTI. The CoaxSim Grid portal developed in this project utilizes basic portal features by the GridSphere framework and Grid Portlet package from GridLab. The main focus of the development of the CoaxSim Grid was to provide customized services for the coaxial injector modeling application. Rather than asking users to get accustomed to what is given with current grid technology, proactively developing users need and the requirements of the application in the context of the grid portal could bridge the gap between the engineering researchers and computer science technology. The architecture presented in this paper is based on the concept that the portal server is a container of service clients that are designed according to the portlet component model. The use of standardized portlet component model enables fast development of customized portal for a given application. It also allows plug-in type of service module development, which guarantees flexibility and portability of the portal contents. By having unified user interface for using all types of grid computing resources in the TeraGrid community, the CoaxSim portal could accelerate the pace of research production of user groups when it is positioned as a research community oriented problem solving environment. Challenges are still there, however, because it still requires high level of understanding of grid

Acknowledgement The authors would like to acknowledge supercomputer time provided by the National Center for Supercomputing Applications and financial support from Korea Institute of Science and Technology Information.

References [1] M.P. Thomas, J. Burruss, L. Cinquini, G. Fox, D. Gannon, L. Gilbert, G. von Laszewski, K. Jackon, D. Middleton, R. Moore, M. Pierce, B. Plale, A. Rajasekar, R. Regno, E. Roberts, D. Schissel, A. Seth and W. Schroeder, ‘Grid Protal Architecture for Scientific Applications’, Journal of Physics, Conference Series 5. Accepted for publication [2] M.P. Thomas, M. Dahan, K. Mueller, S. Mock, C. Mills and R. Regno, ‘Application Portals: Practice and Experience’, Grid Computing Environments: Special Issue of Concurrency and Computation: Practice and Experience 2002 V.14:1427-1444. [3] e-AIRS project at http://obiwan.kisti.re.kr/escience/eairs [4] B. Kim, S. D. Heister “Numerical Modeling of Hydrodynamic Instability of Swirl Coaxial Injectors in a Recessed

9

Region”, 42th AIAA/ASME/SAE/ASEE Joint Propulsion Conference, Sacramento, CA, 2006 [5] B. Kim, S. D. Heister, S. H. Collicott “Three-Dimensional Flow Simulations in the Recessed Region of a Coaxial Injector”, Journal of Propulsion and Power, Vol. 21, No. 4, pp. 728-742, 2005 [6] B. Kim, S. D. Heister “Effect of Chamber Pressure Variation on High-Frequency Hydrodynamic Instability of Shear Coaxial Injector”, 40th AIAA/ASME/SAE/ASEE Joint Propulsion Conference, Fort Lauderdale, FL, 2004 [7] J. Novotny, M. Russell and O. Wehrens, ‘GridSphere: An Advanced Portal Framework’ at http://www.gridsphere.org/gridsphere/wp4/Documents/France/gridsphere.pdf [8] J. Novotny, M. Russell and O. Wehrens, ‘GridSphere: A Portal Framework For Building Collaborations’ at http://www.gridsphere.org/gridsphere/wp4/D ocuments/RioBabyRio/gridsphere.pdf [9] D. Kouril and J. Basney ‘A Credential Renewal Service for Long-Running Jobs’ [10] CluMon at http://clumon.ncsa.uiuc.edu/

10

Secure Federated Light-weight Web Portals for FusionGrid D. Aswath,1 M. Thompson,2 M. Goode,2 X. Lee,1 N.Y. Kim1 1 2

General Atomics, San Diego, California Lawrence Berkeley National Laboratory, San Francisco, California Utah. This project created a national Fusion Energy Sciences Grid (FusionGrid) [3] to provide new capabilities to fusion scientists to advance fusion research. FusionGrid is a system for secure sharing of computation, visualization, and data resources over the Internet. The FusionGrid goal is to allow scientists at remote sites to fully participate in experimental and computational activities as if they were working at a common site thereby creating a virtual organization (VO) of the US Fusion community. The Grid’s resources are protected by a shared security infrastructure including strong authentication to identify users and fine-grain authorization to allow stakeholders to control their resources. FusionGrid uses the X.509 certificate standard and the FusionGrid Certificate Authority (CA) to implement Public Key Infrastructure (PKI) for secure communication.

Abstract The FusionGrid infrastructure provides a collaborative virtual environment for secure sharing of computation, visualization and data resources over the Internet to support the scientific needs of the US magnetic fusion community. Invoking FusionGrid computational services is typically done through client software written in, for historical reasons, the commercial language IDL. Scientists use these clients to prepare input data and launch FusionGrid computational services. There are also numerous web sites throughout the US dedicated to fusion research, functioning as light-weight single purpose portals. Within the FusionGrid alone, there are web sites associated with authentication, authorization, and monitoring of services. Pubcookie and MyProxy technology were used to federate these disparate web sites by enabling them to authenticate a user by their FusionGrid ID and then to securely invoke FusionGrid computational services. As a result of this drop-in authentication mechanism, portals were created that allow easier usage of FusionGrid services by the US fusion community. The shared authentication mechanism was accomplished by the integration of Pubcookie’s single sign-on mechanism with the MyProxy credential repository that was already in use by the FusionGrid. This paper will outline the implementation of the FusionGrid portal technology, discuss specific use cases for both invoking secure services and unifying disparate web sites, present lessons learned from this activity, and discuss future work.

I.

Fundamental to the deployment of FusionGrid into the everyday working environment of US scientists is the usage of the web browser client to deliver some of FusionGrid’s capabilities. Such web browser functions include a Fusion Grid Monitor (FGM) [4], hosted at General Atomics, for monitoring the execution of FusionGrid jobs and a preliminary site hosted at LBNL [5] for user registration and management. Combining these new capabilities with the numerous existing US Fusion web sites that contain documentation and other information relevant to perform sciences on FusionGrid has resulted in a large number of web servers spread across the US that serve some aspect of FusionGrid functionality. A separate project investigated the usage of a Java portal, but having a single general-purpose portal did not correspond to the realities of the highly distributed VO with a significant number of legacy web sites.

Introduction

The National Fusion Collaboratory Project [1] [2] teams the three major U.S. fusion physics research centers: the Princeton Plasma Physics Lab, General Atomics (GA), MIT Plasma and Fusion Science Center, with collaborators from the computer science groups at Princeton, Argonne National Labs (ANL), Lawrence Berkeley National Labs (LBNL) and University of

Access to FusionGrid’s computational service is done through client programs that depend on Globus Secure Infrastructure (GSI) [6] to do secure data access and secure job submission. These client programs have been written in the Interactive Data Language (IDL) [7], a

1

was accomplished, the handoff of the maintenance and further development by fusion scientists, who were neither Java programmers nor portal experts, was not successful. Addition of a new code or service could only be done by users who understood the code to be executed, its portal environment and with administrative access to the site at which the portal was run. In practice, the fusion scientists who understood how to execute a code lacked the portal expertise to integrate their interface into the portal.

commercial software analysis and programming language that is very commonly used with the experimental US fusion community. There are two problems posed by this solution: Globus Toolkit [8] is not available on Windows and requires a fairly complicated installation procedure on UNIX, and IDL, is not available on every potential client machine, since it requires buying a license for each host. Thus a simple web interface that would allow data marshalling and job submission is desirable as it would allow easy client usage from any web browser capable computer. This web interface also has to be enabled to leverage off a single sign-on authentication scheme to get a proxy certificate for the scientist, that is required by the grid middleware for remote job submissions and data access.

Another common approach to hiding the complexities of running a scientific code is to first wrap the code with a simple command line or a GUI interface, prompting for the required parameters and then run the command as a Common Gateway Interface (CGI) script behind a web page; thereby creating a single purpose portal. To launch the code as a grid service, this approach must authenticate the user as a member of the VO permitted to run the service and subsequently retrieve a grid proxy credential on behalf of the user, for use in the Globus job submissions. Pubcookie [16] as an open source software package uses cookies and a central secure login server to enable a set of trusted web sites to effect a single, authenticated sign-on to all the web sites. The login server is the only site that needs to see the user’s password for authentication. The authenticated user’s ID is conveyed to the other web sites in encrypted cookies that can only be decrypted by the login server and the targeted sites. Notably, the user or anyone who gains possession of the cookie can not alter its contents without invalidating it. Pubcookie is implemented on each of the trusted web sites as an Apache module. The login server allows the authentication mechanism to be provided as a plug-in module enabling the deployer to decide if user names and passwords are to be kept in a simple database, an ldap server, Kerberos or some other means. Additionally, MyProxy [17] as a standard open-source grid server provides X.509 proxy credentials, suitable for use in GSI transactions, when provided with a user name and either the user’s password or a trusted credential. As the FusionGrid was already running a MyProxy server as part of the centralized certificate management service, combining it with Pubcookie was an obvious approach to provide an authentication and delegation service that allows existing single-purpose web sites to

II. Related Work The majority of the community’s work on creating scientific portals has been done in Java, leveraging off some popular containers for Java servlet code such as Apache Tomcat [9], Jetty [10], Jboss [11], WebSphere [12], and the Java portlet specification released in October 2003 [13]. The goal of such a portal is to provide a single point of entry to all the functions of a VO, and some of the commonly provided functions include: shared spaces such as chat, calendar and newsgroups, whiteboards, shared applications and group authorizations. Grid Portal containers such as OCGE [14] and GridSphere [15] also provide the X.509 authentications, grid style job submissions and grid data transfers. To provide a GUI interface for marshalling data and setting parameters for running a particular code and to further encapsulate the code within such a portal, the developers must be knowledgeable about the scientific code being called and the tools and libraries that are provided by the portal. Typically portals are designed to run centrally at one site providing access to all of the VO services. Installing and maintaining an integrated portal is a non-trivial undertaking, complicated by the fact that the state-of-the-art portals are large rapidly evolving software projects, based on frequently changing third party software, e.g. portlet containers, portlet standards and authentication approaches. The National Fusion Collaboratory Project worked with the developers of an OGCE portal to deploy a fullservice grid portal for FusionGrid. While deployment of the provided portal framework

2

single sign-on across all portals a user might require; to get the necessary grid credentials that enable the client-side software to make a GSIenabled call to a FusionGrid service, and to provide access to the Globus software from within the portal.

authenticate FusionGrid members and securely access remote data and submit jobs, through the GSI infrastructure. Other work has also been done to integrate these two packages. The MyProxy server has an authentication plug-in module, which allows it to authenticate a user via a Pubcookie cookie. The National Virtual Observatory has done a similar integration of MyProxy, Pubcookie and PURSe (Portal-based User Registration System) [18] to provide proxy certificates to its portals.

B.

We combined several existing software modules to provide a Federated Portal Framework, namely: a Pubcookie module providing single-signon for the set of trusted web servers; a MyProxy server handling the storage of long term credentials and delegation and storage of short-term proxies needed for GSI[4]; the Credential Manager handling the registration of users and the management of long-term credentials; The combination of the four components: Credential Manager, Pubcookie login server and two instances of MyProxy one for long-term and another for short-term credentials is referred to as the Authentication and Delegation Service (ADS). Each of these servers is co-located on the same host, so that the connections between them are automatically secure. The FusionGrid authorization server, ROAM [19] is used by a FusionGrid service to check a user's access to the specific grid resource based on the Common Name (CN) in the user's X.509 certificate. Figure 1 presents an overview of the architecture and the details on how it is used.

III. Federated Web Portals A.

Approach

Overview

As described in the previous two sections, the National Fusion Collaboratory Project aimed to provide browser accessible GUIs for job submission on the FusionGrid. These single-purpose portals would help lead the user through the data preparation stages, explain and set parameters, record input for future reference or reuse, invoke the service, monitor the process and make results available to the user. In order to succeed within the FusionGrid environment, it is necessary that such interfaces be written by the service provider in a language of their choice, requiring minimum additions to the standard Apache web server installations. The major challenges to providing such single-purpose portals is the ability to provide

Fig. 1. Federated Portal Architecture.

3

The infrastructure for the portal architecture consists of:

E.

FusionGrid runs two MyProxy servers, a MyProxy Credential Store and a MyProxy Proxy store. The first MyProxy server is deployed to store long-term credentials in the CredentialStore. For FusionGrid jobs that are submitted directly by a user, a short-term proxy is delegated from the CredentialStore using the gridId and password provided by the user.

• A set of servers running on a secure and trusted host (ADS) • A set of trusted web-interfaces that support https, cookies and an Apache Pubcookie module, • A FusionGrid authorization server ROAM.

A second MyProxy server is deployed to store short-term proxies in the ProxyStore to support proxy renewals by long-running FusionGrid services. These proxies can be used for delegation by a trusted server presenting its own X.509 credential. This style of delegation can be used by portals to get the proxy certificate for an authenticated user to submit Globus jobs on their behalf. Since Pubcookie is using MyProxy to authenticate the gridId and password via a myproxy-login interface, it is natural to store the resulting proxy in the short-term ProxyStore. These proxies are set to allow delegation only by the trusted web portals. Thus when a web portal needs a proxy certificate to do a Globus request, it can contact the short-term MyProxy server to get one. This requires that each web portal have its own X.509 service certificate which is registered with the ADS. In order to co-ordinate the Pubcookie password authentication with that of MyProxy, an authentication plug-in was added to Pubcookie that calls MyProxy to check the password. A side-effect of this call is the issuing of a proxy credential.

C.

Credential Manager A user on registering with the FusionGrid chooses a login ID, called the gridId used for their subsequent logins. The first and last name provided as part of the user’s registration process is used to define the Common Name (CN) that forms a part of the X.509 credential issued to the user. The Credential Manager enters these long-term credentials into a MyProxy server, indexed by gridId and encrypted by the user’s pass-phrase. D.

MyProxy server with Pubcookie

Pubcookie

The Pubcookie framework consists of an Apache module deployed on all of the trusted web portals and a central login server providing the basic multi-site/single sign-on ability for web sites in the same domain, e.g. fusiongrid.org. When a user connects, the Pubcookie module looks for a session cookie and if such a signed-cookie containing the user's gridId is provided, it knows that the user has been authenticated. However, in the absence of such a cookie, it redirects the request to the Pubcookie login server. The first time a user is redirected to the login server, they are presented with a login form prompting them for a gridId and password. The login server authenticates the user with the gridId and password provided thereby returning two cookies: a granting cookie scoped to reach the web portal that was originally contacted and a login-cookie scoped to be returned to it on access to any other web portal.

The process of securely launching FusionGrid computational codes with the Federated Portal Architecture has the following steps as shown in Fig. 2: 1. The user connects to the web portal to launch a specific FusionGrid service. 2. If the user has not previously authenticated, the Pubcookie module redirects the request to the Pubcookie login server.

When a different portal is first contacted, the redirection to the login server contains the logincookie. The login server uses this cookie to authenticate the user without prompting them for a password again. It then generates a new granting-cookie which is subsequently returned to the second web portal.

3. The Pubcookie login server requires the user to sign-on with Fusion-Grid credentials.

4

Fig. 2. Details of Authentication and Delegation.

4. User is authenticated with the ADS server and a short-term proxy is delegated from stored long-term credentials 5. The short-term proxy is placed in the secondary MyProxy server for subsequent portal retrieval.

7. The ‘redirect’ page causes the user’s browser to re-connect to the original web portal, this time with the granting cookie.

IV. Web Portal Use Case ONETWO [20] is a time dependent magnetic fusion analysis and simulation code that is available as a computational service on the FusionGrid. This grid-enabled code running on a cluster of Linux machines can be invoked directly from General Atomics (GA) locally or remotely via FusionGrid as shown in Fig. 3.

6. Upon a successful authentication, the user is sent a ‘redirect’ page and is granted a login cookie. The login cookie is used on any subsequent visits by the user to the login server (single sign-on capability)

Fig. 3. ONETWO as a FusionGrid service.

5

page. The portal uses its host credentials to retrieve the proxy certificate for the authenticated user to start the ONETWO run. The portal queries the ROAM authorization server to check if the authenticated user has permission to access the ONETWO resource. Authorized users are presented with the option to gather and process inputs as shown in Fig. 5. The ONETWO code has hundreds of different input settings but this initial version of the portal interface has only the most commonly changed available for user adjustment. As use of the portal grows more inputs will be added. The general inputs include which fusion plasma shot, what time range within that shot to analyze, and an optional text comment string. The advanced input section includes: where to get the plasma shape (EFIT ID), the plasma temperature and density profiles (ZIPFITS and Profile directory), the input template file that specifics all possible inputs (INONE Template), and specifics about the auxiliary heating of the plasma (NBI and ECH). The inputs thus prepared on a desired shot are inserted into a database and subsequently retrieved by the ONETWO computational code during its run. The run can be monitored through a FusionGrid Monitoring system (FGM) as shown in Fig. 6. The user can then access the results of the run stored in an MDSPlus [21] data repository identified by the FGM logs.

AUTOONETWO and PREONETWO are IDL-based client-side GUI tools hosted at GA to help scientists in preparing ONETWO runs on the FusionGrid. Though differing in the way the inputs are gathered and processed and the specific scientific problem they address, these GUI tools manages code runs with a code run database, and upload the inputs prepared to the ONETWO computational service requesting a new code run; thereby reducing the work required by the user to launch FusionGrid code runs. However these client programs depend on both a Globus infrastructure [8] and a commercial IDL license to be made available on every host machine. With the ongoing effort to have a simple web interface to allow easy client usage from any machine supporting a web browser, we have developed a web portal (Fig. 4) with the Federated Portal Architecture described above to enable authenticated FusionGrid users to securely invoke the ONETWO computational service on the FusionGrid. When clients attempt to access a Pubcookie protected web page hosted by the web portal, through their browsers, they are prompted for their gridId and password by the login server at cert.fusiongrid.org. Upon a successful authentication, a proxy is delegated from the MyProxy server for later retrieval by the portal and the user is redirected to the originally requested

Fig. 4. Web Portal hosting the web page to launch the ONETWO service on the FusionGrid.

6

Fig. 5. ONETWO Input Preparation for FusionGrid users authorized to access ONETWO.

Fig. 6. FusionGrid Monitor (FGM) logs monitoring a ONETWO run with a run id of 1190. Results of this run are stored in the MDSPlus tree AOT06.

7

ID’s and passwords to be able to access and edit such pages, and make requests via Bugzilla.

V. Discussion and Concluding Remarks With a straightforward implementation, the Federated Web Portal worked as expected to authenticate, authorize and provide a proxy credential for the user and we were successful in launching the ONETWO computational code as a secure grid service on the FusionGrid via the portal. With the MDSPlus repository storing the outputs of the code runs, output visualizations are currently presented to the fusion scientists with existing GA software tools, such as ReviewPlus run locally rather than through the portal. As the scientists have commenced their usage of the portals, we are yet to determine the types of physics codes that can be ported to our portal framework. Our preliminary phase shows that codes with intensive visualizations during the process of input preparation are not suited to be accessed via our web portals. However, those codes that do not require extended graphics such as our ONETWO codes are well suited to be used with our proposed portal architecture. A brief duration of six months should enable us to better make such a decision on the best usage of this technology. It is expected that with a straightforward procedure for a portal site to authenticate, authorize and obtain and use a proxy, the significant work in creating a code portal will be in presenting a convenient and intuitive interface for input and output to FusionGrid services. For our future work, to further enable visualizations of the code run outputs, we plan on integrating Elvis [22], as the scientific visualization package that would allow users to view graphs in a browser window. As the Federated Portal approach requires each of the web portals to have a fusiongrid.org alias in addition to their primary name, we would like to eliminate this additional requirement. Having migrated to a Wiki-based web site for the DIII-D fusion facility and a Bugzilla system to track requests from users on software updates and possible bugs, we will examine the usage of FusionGrid credentials for the login scheme. As the DIII-D wiki site requires the user’s login ID to be tracked to monitor edits on the web pages themselves, this Pubcookie model of authentication with the X.509 FusionGrid credentials would not only secure the DIII-D wiki pages, but would also eliminate the need for users to remember a separate set of login

Acknowledgment This work funded by the SciDAC project, U.S. Department of Energy under contract DEFG02-01ER25455 and by the Director, Office of Science, Office of Advanced Science, Mathematical, Information and Computation Sciences of the U.S. Department of Energy under contract number DE-AC02-05CH11231. The authors wish to thank David Schissel for his valuable suggestions on this paper. References [1] D.P. Schissel, et al., “Building the US National Fusion Grid: Results from the National Fusion Collaboratory project,” Fusion Eng. And Design 71, 245 (2004). [2] D.P. Schissel, et al., “The National Fusion Collaboratory Project: Applying Grid Technology for Magnetic Fusion Research,” Proceedings of the Workshop on Case Studies on Grid Applications at GGF10 (2004). [3] The National Fusion Collaboratory, http://www.fusiongrid.org. [4] S.M. Flanagan, J.R. Burruss, C. Ludescher, D.C. McCune, Q. Peng, L. Randerson, D.P. Schissel, “A General Purpose Data Analysis System with Case studies from the National FusionGrid and the DIII-D MDSPlus between pulse analysis system.” [5] J.R. Burruss, T.W. Fredian, M.R. Thompson, “Simplifying FusionGrid Security, Challenges of Large Applications in Distributed Environments (CLADE) workshop at HPDC”14, July 2005, Research Triangle Park, NC. [6] V. Welch, F. Siebenlist, I. Foster, J. Bresnahan, K. Czajkowski, J. Gawor, C. Kesselman, S. Meder, L. Pearlman, S. Tuecke, “Security for Grid Services,” Twelfth International Symposium on High Performance Distributed Computing (HPDC-12), IEEE Press June 2003. [7] The Data Visualization and Analysis Platform (IDL), http://www.ittvis.com/idl/. [8] Globus, docs/2.4/.

8

http://www.globus.org/toolkit/

[9] Tomcat, http://tomcat.apache.org/. [10] Jetty, http://www.mortbay.org/. [11] JBoss, http://labs.jboss.com/portal/jbossportal

[18] M. Freemon, http://grid.ncsa.uiuc.edu/ myproxy/talks.html [19] J.R. Burruss, T.W. Fredian, M.R. Thompson, “ROAM: An Authorization Manager for Grids,” to appear in the fall 2006 in Journal of Grid Computing. [20] W. Pfeifer, R.H. Davidson, R.L. Miller, and R.E. Waltz, General Atomics Report GAA16178 (1980).

[12] WebSphere, http://www-306.ibm.com/ software/websphere/ [13] Java Portlets, Final release Oct 2003; http://jcp.org/aboutJava/communityprocess/ final/jsr168/ [14] OGGE, http://www.collab-ogce.org/ogce2/

[21] J.A. Stillerman et al., “MDSPlus,” Rev. Sci. Instrum. 68, 939 (1997). [22] Elvis, http://w3.pppl.gov/elvis/.

[15] GridSphere, http://www.gridsphere.org/ gridsphere/gridsphere [16] Pubcookie, http://www.pubcookie.org/ [17] J. Basney, M. Humphrey, and V. Welch, The MyProxy Online Credential Repository, Software: Practice and Experience, Volume 35, Issue 9, July 2005, pages 801816, also http://grid.ncsa.uiuc.edu/ myproxy/

9

Portal-based Support for Mental Health Research David Paul1 , Frans Henskens1 Patrick Johnston2 and Michael Hannaford 1 1

School of Electrical Engineering & Computer Science, The University of Newcastle, N.S.W. 2308, Australia 2 Centre for Mental Health Studies, The University of Newcastle, N.S.W. 2308, Australia

Abstract. This paper describes experiences with the use of the Globus toolkit and related technologies for development of a secure portal that allows nationally-distributed Australian researchers to share data and application programs. The portal allows researchers to access infrastructure that will be used to enhance understanding of the causes of schizophrenia and advance its treatment, and aims to provide access to a resource that can expand into the world’s largest on-line collaborative mental health research facility. Since access to patient data is controlled by local ethics approvals, the portal must transparently both provide and deny access to patient data in accordance with the fine-grained access permissions afforded individual researchers. Interestingly, the access protocols are able to provide researchers with hints about currently inaccessible data that may be of interest to them, providing them the impetus to gain further access permissions.

1 Introduction Schizophrenia is a brain disease that affects approximately 0.6-1.5% of the population, with an incidence of 18 - 20 cases per 100,000 per year [9]. Although prevalence is low, the burden of the illness upon society and upon sufferers and their families is extremely high. The World Health Organisation, for example, rates schizophrenia amongst the ten leading causes of disease burden. The disorder involves severe cognitive, affective and perceptual dysfunctions, which, at an overt behavioural level, manifest themselves in terms of delusional beliefs and disorganised behaviours; perceptual disturbances including, particularly, auditory hallucinations; and lack of motivation, and general decline in personal and social functioning. Consequently, it is a disease associated with very high costs to government (AUD35,000 per patient per year) [1] and extremes of social impoverishment and economic disadvantage [10]. Recent scientific advances have le d to a model of schizophrenia that recognises the role of abnormal neuro-developmental and/or

neurodegenerative processes in altering the structure and function of the brain. Until relatively recently detailed images of cerebral morphology could only be obtained from postmortem tissue. The limitations of the traditional tissue -based approach to neuropathology can potentially be overcome through the use of neuroimaging technologies. Neuroimaging techniques offer the potential for in vivo studies of brain structure as well as func tion, thus overcoming problems relating to tissue degeneration postmortem, invariably small samples of post-mortem brains and, of course, the obvious fact that the tissue is derived from deceased persons. Moreover, techniques such as magnetic resonance imaging (MRI) allow for repeated testing of the same individuals, and thus longitudinal studies may be undertaken. A further advantage of MRI is that it may be employed to produce high-resolution threedimensional digital representations of brain structure. This approach lends itself more easily to sharing and distribution of the primary source data (i.e. digital images) among research teams than does traditional approaches in neuropathology (i.e. where the brain tissue itself is the primary source data). It also supports the

This research is supported by The Australian Research Council (ARC) grant SR0566756 (2005-2006). On-going work is supported by the National Health & Medical Research Council (NHMRC) grant AIP/ERP #1679 (2006-2010), and by a grant from the Pratt Foundation (2007-2011). 1

application of computational image processing techniques for the precise definition, localisation and measurement of brain structures.

brain activity such as functional MRI (fMRI) or event-related potentials (ERPs). A further example of the significant impact of data-access-enabling infrastructure on research was the National Institute for Schizophrenia and Allied Disorders (NISAD) [15] Schizophrenia Research Register. It was intended that the Virtual Brain Bank would act as the foundation to which could later be added putative endophenotype measurements derived from the Schizophrenia Register participants and other neurocognitive studies of schizophrenia as well as genetic information derived from the DNA Bank and the Laboratory of Neuro Imaging (LONI) [13]. Such integrative strategies that combine various methodological approaches have been shown to considerably further the understanding of the pathology of schizophrenia. The recently established Australian Schizophrenia Research Bank (ASRB) builds on and extends the ideas of such previous facilities to create a nationally accessible resource for schizophrenia researchers in Australia and beyond.

The heritability of schizophrenia is of the order of 70-80%. However, the inheritance pattern is not the cla ssical Mendelian type. As with other complex diseases (eg, diabetes, cardiovascular disease), it is believed to involve a number of contributing genes, each of small effect, interacting with each other and with environmental factors. With this in mind, traditional genetic research approaches based on the diagnostic category of schizophrenia need to be modified if we are to further our understanding of the genetic basis of this disease. A more recent approach in schizophrenia research has been to investigate discrete neurobiological or neurocognitive characteristics that may be more closely linked to a particular gene [8, 12] rather than the clinical syndrome diagnosed as schizophrenia. These characteristics, known as endophenotypes, can assist researchers in unravelling the complex genetic causality of schizophrenia and help to identify individuals who carry the genetic trait for these discrete deficits [20].

In this paper we describe and discuss issues in the use of primarily Globus -based [4] technology to build a grid [3] that allows geographically distributed researchers to contribute to initially the NISAD/LONI Virtual Brain Bank, and now the encompassing ASRB’s collection of schizophrenia -related data and software resources in the quest for knowledge on the reasons for and treatment of schizophrenia.

The NISAD/LONI Virtual Brain Bank [14] primarily consists of a large distributed database of high resolution 3D computer representations of the brains of approximately 250 schizophrenia patients and age/gender-matched healthy control subjects, derived from structural MRI images and transformed into a standardised spatial coordinate system. The purpose of this bank is to provide a resource for the analysis of subtle structural variations between the brains of schizophrenia patients and healthy controls, and to map brain changes that occur as a result of variables such as age, gender, duration of illness and duration of untreated psychosis. The brain bank also provides the opportunity to explore associations between brain structure and clinical or neurocognitive measures, gene expression or genetic linkage data, and functional measures of

2 The ASRB Grid A major issue for schizophrenia research is the expense of the collection of patient data (e.g. MRI brain scans, tissue samples) needed for analysis. The ASRB will have a major impact on schizophrenia research in Australia because it will amortise the high cost and the significant time involved in obtaining data across the 2

national body of researchers. As schizophrenia is likely to involve multiple genes of small effect, access to large sample sizes is a key to undertaking studies of sufficient statistical power. With its cross-referenced data in clinical, cognitive, neuroanatomical and genetic domains, the ASRB will make a huge contribution to schizophrenia research on a national scale, enabling multiple research questions to be addressed relatively easily in a large sample that would otherwise be inaccessible or prohibitively expensive for independent investigators to acquire. This large data set will be formed by merging existing data held by groups around the country, and supplementing it with data obtained by a concerted recruitment and collection process.

potentially patients.

beneficial

treatments

for

those

As the ASRB Grid contains personal patient information, security is of vital importance. Typical Grids require strong security to determine whether a user should have access to a given system, or set of systems, without the need for any fine-grained security; a user is either allowed to access the system, or they are not. The ASRB Grid is different because users have different access rights to the resources provided by the Grid, even those on an individual component system. Further, a researcher should be able to perform a preliminary query on data for which they are not currently authorised, allowing them to identify data of interest as a pre-cursor to a request for access to it. For example, it should be possible for the researcher to search for scans exhibiting particular features to determine if there are sufficient samples to justify their requesting access to them. If there were insufficient data items that match their query, it would be a waste of time and resources to request access to the data. If, on the other hand, it was found that there was a sufficiently large extant data set (albeit currently unavailable to the individual researcher), it is likely that a request for access to that existing data would be significantly easier (and less expensive) to achieve than collection of new data. Notwithstanding, it is essential that certain aspects of the data, especially information that can identify patients, be inaccessible to any user who has not been given specific rights to access it.

Ethics approvals are necessarily associated with the collection of data and samples from live patients. Such ethics approvals typically specify the project for which data is to be used, and limit the group of researchers who can access the data to, for example, those at a particular institution, or in a particular research group. It is also common that most researchers permitted to use and analyse patient data are prevented from being able to identify patients from their data (i.e. the data is de-identified). The extant data collections currently held at the disparate Australian member sites are all subject to existing ethics approvals. Access to the new patient data for which collection has been funded by the NHMRC, is similarly controlle d. Thus, a major and important aim of the ASRB Grid is to provide controlled access to the data available to each particular user of the Grid. The most obvious need is to allow all authorized users to access the newly collected data, but it is also important to allow access to any other data collections for which the user has approval, either through their institution, research group, or personally. A further consideration is that it should be possible for selected personnel to identify patients from the ir data in the circumstance that analysis has discovered

Once the researcher has the data needed for their experiment, they typically would execute computer programs to analyse this data. At present this can involve manually collecting the data into a compressed archive, sending it to, for example, Los Angeles via FTP, and waiting for the results to be returned. At the remote processing site, a user must extract the data, 3

schedule it for analysis, collect the results and then return them to the initiating researcher. Other less compute-intensive tasks can be controlled by a single user, though these still require manual scheduling on computers in Australia, which can be time consuming, increasing the time needed by the researcher to do their job. It is intended that by utilizing compute servers in the ASRB Grid, this handson approach to computer-based analysis can be reduced, with researchers simply submitting the job to the Grid, after which the Grid automatically schedules and runs the job, collects the results, and returns them to the researcher, with no further human interaction required.

after a request has been made the service can later be queried to obtain updated information about the task.

As data access is a very important part of the ASRB Grid, two important components of the Globus Toolkit for this project are GridFTP [5] and OGSA-DAI (Open Grid Services Architecture Data Access and Integration) [17]. GridFTP is an extension to regular FTP that supports using Globus credentials for authorization and authentication. It has been extended in Globus Toolkit 4 with the Reliable File Transfer service, which is a Web service for managing secure third-party GridFTP transfers. OGSA-DAI is middleware designed to give secure access to data stores such as relational databases, as well as to integrate data from different sources via the Grid. It allows the access of relational databases using the WSRF, giving the ability to securely access them via Web services.

A final and important requirement of the ASRB Grid is that it should be easy to use, and provide reasonable performance and feedback. If the user interface to the new infrastructure is too complex, or if the performance is pedestrian, users will prefer to continue using the familiar old methods, with all their problems. Thus, use of the new system must be as intuitive as possible, and should hide or abstract over all unnecessary complexity. This means that sensible defaults should be chosen for all options, and a consistent interface should be provided to enable the researchers to concentrate on their research rather than being caught up dealing with the vagaries of the computer support.

It was decided that a Web portal should be used to access the Grid systems, as this will eliminate the need for researchers to install special software on their machines, providing flexibility with respect to client location and host computer. The portal framework chosen is Gridsphere [16], with GridPortlets [19] used to access the Grid. Gridsphere is an open-source portal framework completely compliant with the JSR 168 specifications, so that any standardscompliant portlet can be used by Gridsphere. GridPortlets are a set of portlets for Gridsphere that allow access to Grid resource and user credential management, as well as GridFTP operations, and many other useful Grid activities. The GT4Portlets extension to this allows the execution of jobs on remote Globus Toolkit 4 systems, and further enhances GridPortlet’s compatibility with the newest version of Globus.

3 Support for Fine-grained Security To make the ASRB Grid as accessible as possible, it was decided at an early stage that Web services should be used wherever possible . It was also a preference of the Australian Research Council that the Globus Toolkit 4 [4] be used. Thus Globus was chosen as the software to provide the grid framework. Version 4 of the Globus toolkit is mainly built on the Web Service Resource Framework (WSRF), which allows Web services to have state , so that 4

In order to supply users with credentials to access ASRB Grid resources, a SimpleCA certificate authority is being established. To further facilitate the researcher’s use of the system, PURSe portlets [2] are used to eliminate the user’s need to knowingly interact with this system. Using these portlets, a user fills in a Web-based form to request an account. The user is then sent an email to verify their request and an administrator is informed of the request. The administrator can accept or reject the user, and has the capability to provide the user with access to an account on the Grid; ultimately the user is informed by email of the result. Provided the user is accepted, appropriate Grid credentials are automatically created for the user and a proxy certificate stored for them in a MyProxy server. The user can then log in to the Web portal, using a password supplied by them in their initial request, and a proxy certificate is automatically retrieved from the MyProxy server. This proxy certificate will then be available for access by the portlets in the Web portal. The portlets use these credentials to authenticate with any Grid resources in a manner that is completely transparent to the user.

is needed, and the complexities of these relationships can best be handled by the users themselves. GridFTP is the only component of the Globus Toolkit that supports CAS out of the box, though OGSA-DAI can be extended to support CAS with very little impact on performance [18]. Much of the Globus Toolkit is currently accessible only through the use of command-line statements. Technologies such as the CoG Kits [22] and GridPortlets make access to Globus Grids much easier, but the CAS technologies that we have chosen to use have really only been usable from the command line. Thus, one of the first things needed by this project is portlets for accessing CAS. A portlet that allows authorized users to manage CAS entities has been created. With this facility users with the correct CAS permissions are able to view, create, and delete CAS entities, such as groups or service actions. In addition the portlet provides the ability to grant and revoke rights to groups and services. CAS will thus also be used by administrators to grant access to various database tables, through OGSA-DAI.

Since identified patient data will be stored on the ASRB Grid, it is vitally important that researchers are restricted to access only that data for which they are approved (resultant from ethics approval, or otherwise). As a result it is required that users be given different levels of access to resources based on both their own identity, and the groups to which they belong. The Globus Toolkit includes a component that can be used for this purpose: the Community Authorization Service (CAS) [6] (which is not to be confused with JA-SIG’s Central Authentication Service [11]). CAS allows resource providers to give course-grained access to various systems, handing finer-grained access control management to the community of users. This is important for the ASRB Grid because there are very complex levels of access for different data resources, so fine-grained control

4 Future Work Development of the ASRB Grid is very much an on-going project, and there are a number of parallel development tasks in progress, as described in the following sub-sections.

4.1 Description of Patient Data

The above security framework is designed to provide tightly controlled access to resources such as data and computation. To date much of the extant patient data has not been available online; rather the data are stored on CDs or DVDs in researcher’s offices, and these must be moved to on-line storage subsystems so they can be 5

accessed using the Grid. A further issue is the existence of aggressive firewalls that have been used to protect confidentiality of patient data at some of the host sites. The recently-funded collection of substantial quantities of new patient data has not yet begun but is imminent, so provision of infrastructure for storage and processing of that new data is a priority. In parallel it is necessary to finalise meta-data description of the heterogeneous extant (and the homogeneous future) data that will be accessible through the ASRB Grid. Until this significant task is completed no specific tools development can take place.

4.3 Abstraction over Distributed File Storage

A service that allows users to create logical folders, providing a window onto data on all the different GridFTP servers to which they have access, will also be integrated into the system. Thus, users will simply see a familiar folder-like structure containing sub-folders and files. This is achieved using the Globus Replica Location Service (RLS) [7] and a system to map a set of logical files to a set of logical folders; the actual files in the folders may be stored on any of the GridFTP servers available to the user of the Grid. The various locations of the data available to the user will be abstracted away by this service, allowing users to simply see their data without regard for the location at which it is stored.

4.2 Extension of Portlet Support

To date, there has be en a paucity of reported development of portlets to access OGSA-DAI resources, especially for OGSA-DAI secured by CAS. While some OGSA-DAI portlets have been developed, they currently do not provide the level of support for security required by this implementation, and so must be extended to provide the necessary security. It will also be necessary to create or modify some GridFTP portlets to include CAS functionality so that researchers are able to easily share their data with groups to which they wish to provide such access. It is also planned to create a new PURSe Portlets registration module to automatically enrol users in various CAS groups when their account is created. This will include placing them in a group over which they have complete control, as well as giving them exclusive access to space on a GridFTP server. Users will then be able to create their own selfcontrolled groups, allowing them to share their data with authorised users while asserting as much fine-grained control as is necessary. There would be no requirement for administrator intervention in the establishment and control of such groups.

4.4 Access to Data Processing and Analysis Facilities

The ultimate aim of the ASRB Grid infrastructure is to provide researchers with the ability to analyse (subsets of) the data collection, leading to advances in the understanding and treatment of schizophrenia. While it will be possible (subject to access rights) for researchers to download data to their own machines to perform analysis, there will be tasks which will benefit from access to the parallel resources of the Grid. For example, the data associated with a single MRI scan can exceed one gigabyte, and transfer of such quantities of data across the Internet is expensive with respect to time (noting that some of the member sites are up to 4,000 kilometres apart). Analysis of such data is more efficiently performed by positioning the computation close to the data source, with high bandwidth data path(s) joining them. Unfortunately, automatically executing a task on a set of remote machines is difficult. Projects such as GT4Portlets allow the execution of jobs on a single remote machine, and projects such as the Gridbus Broker [21] automatically 6

allocate tasks to servers, but the interfaces to these are very general. Thus a further task for this project is to create a portlet wizard that allows the easy creation of a portlet to execute a particular application. It is envisaged that these portlets will be based on the Gridbus Broker, but will enable researchers to choose input files and set parameters using a simple, easily understandable Web form. The provision of a wizard will make it easy for developers to create portlets for many different programs. If a specific program has special needs, however, developers will still have access to the full source code so that the portlet can be modified as needed. This will enable researchers and developers to use the processing capabilities of the distributed compute servers much more easily than is currently possible.

and remotely stored data; intuitive wizard-based access to distributed compute servers and application programs; the ability for users to provide individuals and/or groups with controlled access to their personal data store.

6 References 1. Carr, V., Lewin, T., Neil, A., Halpin, S., and Holmes, S., Premorbid, psychosocial and clinical predictors of the costs of schizophrenia and other psychoses. British Journal of Psychiatry, 2004. 184: p. 517-525. 2. Christie, M., PURSe Portlets Website, http://www.extreme.indiana.edu/portals/purse-portlets. 3. Foster, I. and Kesselman, C., The Grid: Blueprint for a New Computing Infrastructure. 1999: Morgan Kaufmann. 4. Foster, I. Globus Toolkit Version 4: Software for Service-Oriented Systems. in IFIP International Conference on Network and Parallel Computing. 2005: SpringerVerlag. 5. Globus, GT 4.0 GridFTP, http://www.globus.org/toolkit/docs/4.0/data/gridftp/. 6. Globus, GT 4.0: Security, http://www.globus.org/toolkit/docs/4.0/security/. 7. Globus. RLS: Replica Location Service, http://www.globus.org/rls/. 8. Gottesman, I.I., McGuffin, P., and Farmer, A.E., Clinical genetics as clues to the real genetics of schizophrenia (a decade of modes t gains whilst playing for time). Schizophrenia Bulletin, 1987. 13(1): p. 23-47. 9. Gureje, O. and Bamidele, R.W., Gender and schizophrenia: association of age at onset with antecedent, clinical and outcome features. Australia and New Zealand Journal of Psychiatry, 1998. 32(3): p. 415-423. 10. Jablensky, A., Epidemiology of schizophrenia: the global burden of disease and disability. European Archives of Psy chiatry and Clinical Neuroscience, 2000. 250(6): p. 274-285. 11. JA-SIG, JA-SIG Central Authentication Service, http://www.ja-sig.org/products/cas. 12. Kremen, W.S., Faraone, S.V., and Seidman, L.J., Neuropsychological risk indicators for schizophrenia: a preliminary study of female relatives of schizophrenic and bipolar probands. Psychiatric Research, 1998. 79(3): p. 227-240. 13. LONI, Laboratory of Neuro Imaging, http://www.loni.ucla.edu/.

5 Conclusion This paper introduces a project that uses the Globus toolkit and related technologies that allows Australian Mental Health researchers to share data and application programs in their quest for understanding of schizophrenia and ultimately improvements in its treatment. A web services portal that provides fine-grained control over user access to resources is described. This portal simultaneously provides simple authentication-based access for users and certificate-based access to sub-sets of the entire resource collection. Users are unaware of host network boundaries and the need for separate authentication for the disparate sites and servers; these requirements are abstracted away by the portal. The ASRB Grid is very much a work in progress. On-going development of abstractions over distributed data storage, remote compute services and portal development are also presented. These facilities will result in: nested folders that provide consistent access to locally 7

14. NISAD, The NISAD/LONI Virtual Brain Bank , http://www.nisad.org.au/newsEvents/resNews/wwwscz res.asp. 15. NISAD, http://www.nisad.org.au/. 16. Novotny, J., Russell, M., and Wehrens, O., Gridsphere: A Portal Framework for Building Collaborations, Gridsphere Project Website. 17. OGSA -DAI, OGSA-DAI Software, http://www.ogsadai.org.uk/index.php. 18. Pereira, A., Muppavarapu, V., and Chung, C., RoleBased Access Control for Grid Database Services Using the Community Authorization Service. IEEE Trans. on Dependable and Secure Computing, 2006. 3(2): p. 156-166.

19. Russell, M., Novotny, J., and Wehrens, O., The Grid Portlets Web Application: A Grid Portal Framework, Gridsphere Project Website. 20. Trillenberg, P., Lencer, R., and Heide, W., Eye Movements and psychiatric disease. Current Opinion in Neurology, 2004. 17(1): p. 43-47. 21. Venugopal, S., Buyya, R., and Winton, L., A Grid Service Broker for Scheduling e-Science Applications on Global Data Grids. Concurrency and Computation: Practice and Experience, (accepted Jan 2005). 22. von Laszewski, G., Gawor, J., Lane, P., Rehn, N., Russell, M., and Jackson, K., Features of the Java Commodity Grid Kit. Concurrency and Computation: Practice and Experience, 2002. 14: p. 1045-1055.

8

WebGRelC: Towards Ubiquitous Grid Data Management Services Giovanni Aloisio, Massimo Cafaro, Sandro Fiore, Maria Mirto Center for Advanced Computational Technologies, University of Lecce, Italy Center for Euro Mediterranean Climate Changes, Italy {giovanni.aloisio, massimo.cafaro, sandro.fiore, maria.mirto}@unile.it

Abstract Nowadays, data grid management systems are becoming increasingly important in the context of the recently adopted service oriented science paradigm. The Grid Relational Catalog (GRelC) project is working towards an integrated, comprehensive data grid management solution. This paper describes WebGRelC, which is a dedicated grid portal allowing data handling, publishing, discovery, sharing and organization, and its underlying data grid services.

1. INTRODUCTION As pointed out by Ian Foster, we have in the last few years moved towards a new, service oriented science [1] in which software is envisioned as services, and services as platforms. Increasingly, services besides computation rich are also data rich, producing a huge amount of data distributed across multiple data servers. There is a growing need for a grid infrastructure allowing scientific communities sharing data securely, efficiently and transparently. Datasets once created need to be visualized, published, downloaded, annotated etc. Discovery mechanisms, such as searchable metadata directories must be provided to find relevant data collections. Integration and federation services need to cope with independently managed legacy datasets, to infer new knowledge from existing distributed data. Although fundamental building blocks such as distributed file systems and semantic storage already exist, data grid management systems are still in the pioneering phase. The main challenge is to design and implement reliable storage, search and transfer capabilities of numerous and/or large files over geographically dispersed heterogeneous platforms. The Grid Relational Catalog (GRelC) project [2], a data grid research project developed at the Center for Advanced Computational Technologies (CACT) at the University of Lecce, is working towards an integrated, comprehensive data grid management solution. It provides, besides traditional command line and graphical interfaces, a dedicated grid portal allowing data handling,

publishing, discovery, sharing and organization. Grid portals are web gateways to grid resources, tools and data. They hide the underlying grid technologies and provide advanced problem solving capabilities to solve modern, large scale scientific and engineering problems. This paper describes WebGRelC, which is the GRelC grid portal, and its underlying data grid services. The outline of the paper is as follows. In Section 2, we present the GRelC data grid services underlying the portal, whereas in Section 3 we describe the portal architecture. In Section 4, we discuss the current implementation, technologies and related issues, whereas in Section 5 we present a use case related to the use of the portal for a bioinformatics experiment, whilst Section 6 recalls related work. We draw our conclusions in Section 7. 2. GRELC DATA GRID SERVICES The main goal of the GRelC project is to provide a set of data grid services to access, manage and integrate data (i.e. databases and files) in grid environments. GRelC data grid services already implemented are: data access (DAS), data storage (DSS) and data gather (DGS) service. The Data Access Service (DAS) has been designed to provide a uniform, standard interface to relational and not-relational (i.e. textual) data sources. It is an intermediate layer, which lies between grid applications and Database Management Systems. The Data Storage Service (DSS) provides a comfortable, lightweight solution to disk storage management. It manages efficiently and transparently huge collections of data spread in grid environments, promoting flexible, secure and coordinated storage resource sharing and publication across virtual organizations. DSS besides data handling and remote processing operations, also provide publication and information discovery capabilities, as needed due to the large number of stored objects. The DSS represents a high performance implementation of the grid workspace concept, which is a virtualized

and grid enabled storage space that a community of users can use to share and manage their files/folders taking into account fine-grained data access policies. Grid workspaces represent grid storage spaces accessible by authorized users sharing common interests. Within a DSS, data is fully organized into workspaces and for each one of them, the DSS admin must define a set of authorized users, groups and VOs, the workspace administrators and the physical mounting point. Finally, the Data Gather Service (DGS) offers data federation capabilities providing a second level of virtualization (data integration). This service, which lies on top of DASs or DSSs, allows the user looking at a set of distributed data sources as a single logical entity, thus implementing a data grid federated approach. The proposed WebGRelC architecture provides ubiquitous web access to widespread grid enabled storage resources and metadata, so it is currently concerned with both Data Storage and Data Access Services. Clients are also available both as command line and graphical interface to manage collections of files, but the Grid Portal interface better addresses and increases both transparency and pervasiveness. 3. WEBGRELC The WebGRelC architecture represents an integrated solution to easily, transparently and securely manage data (collections of files) stored within grid storage repositories (GRelC DSSs) and metadata stored within datasets (accessed by means of GRelC DAS). In the following subsections, we will describe the WebGRelC portal, discussing main goal and challenges, grid architecture, subcomponents, security issues and metadata management. 3.1 Goal and Challenges The main goal of WebGRelC is to supply users with Grid data management services through transparent, user friendly, secure and pervasive access to web pages. Moreover, through the WebGRelC Grid Portal scientists can explore data stored within storage resources, discover the data they need and retrieve the relevant data collections through web interfaces (which provides location independence). More in detail, the main challenges are connected with: a) security: it represents a fundamental and crucial requirement in a data grid environment. We need to address security at different levels

providing mutual authentication among grid services, users and machines, data encryption and delegation support. Moreover, security concerning HTTP Internet connections needs to be properly addressed. Within our system, we basically chose to adopt the Globus Grid Security Infrastructure [3]; b) user-friendliness: the portal must provide user-friendly web pages to simplify the interaction between the users and the data grid environment. Our choice leverages current web technologies, including XHTML, CSS etc. but we also plan to switch to recent developments, including the use of portlets, Java Server Faces etc.; c) pervasiveness: the proposed solution leverages the pervasiveness of Web technology to provide users with ubiquitous grid data management facilities. It is worth noting here that client requirements consist just of a standard web browser; d) transparency: the proposed grid portal must conceal a lot of details about grid storage components, data transfer protocol details, heterogeneous storage resources, technological issues, command line parameters/options, and so on. This requirement is multifaceted and in a data grid environment it concerns, among the others, access, location, namespace, concurrency and failure transparency. Several design and technical choices highlighted within this work address all of these transparency issues. 3.2 Portal Architecture The grid portal architecture (Fig. 1) follows a standard three-tier model. The first tier is a client browser that can securely communicate to a web server over an HTTPS connection (no other specific requirements are imposed). On the second tier, the Web Server implements the WebGRelC portal (WGP) which consists of several components (see Section 3.2.1) leveraging (i) GRB [4], and (ii) GRelC DSS, DAS and SDAI client libraries. WebGRelC interacts with a MyProxy server [5] for secure user’s credentials (proxy) storage and retrieval and with the Portal Metadata Catalogue (PMC) to manage user’s profiles. Finally, GRelC DSSs are deployed on the third tier, the data grid infrastructure, providing a lightweight and grid enabled solution for disk storage management. 3.2.1

WebGRelC Portal

WebGRelC represents the core of the proposed architecture, implementing a grid portal able to

retrieve data and present them (via HTTP protocol), within HTML pages. As can be seen in Fig. 2 the WebGRelC portal consists of the following: - Profile Manager: it handles the user’s profile managing metadata stored within the Portal Metadata Catalog (PMC relational database). It allows (i) inserting, updating and deleting personal information as well as (ii) managing a list of available grid enabled storage resources, workspaces, etc. Currently, the PMC runs on different DBMSs by means of the GRelC Standard Database Access Interface (SDAI, see Section 4.3); - Credential Manager: it allows configuring the credentials to be used for a given set of resources, retrieving them from a MyProxy server. After this initial configuration step, the WebGRelC grid portal transparently retrieves the credentials needed to access specific data sources; - Remote Administration: it provides basic functionalities to access and manage metadata information stored within GRelC DSS Metadata Catalog. Through the portal, the user can remotely manage administration information about (i) users, groups and VOs, (ii) internal workspaces configurations, (iii) data access control policies, etc. Moreover, the proposed grid portal provides admin sections for logging (to display information related to all of the operations carried out at the DSS side), and check-coherence (to report system coherence problems between data (content) and metadata (context));

(iv) data access, creation and deletion capabilities (Posix-like oriented). File Manager also supports parallel and partial file transfer as well as file copy between two GRelC DSSs (both push and pull mode are currently available). Along with synchronous functionalities, the proposed grid portal supports the GRelC Reliable File Transfer (G-RFT) mechanism, an asynchronous service primarily intended to be used to reliably copy files from one DSS to another one. Users can submit G-RFT requests simply by filling out a web form containing information about (i) DSS source and destinations, (ii) myproxy access parameters, (iii) file transfer options related to data transfer protocol (HTTPG, GridFTP) and data transfer mode (push or pull), (iv) request options connected with priority level of the G-RFT request, the maximum number of retries to be used in case of failed data transfer, related delay and backoff (linear, exponential). Moreover, within the WGP, users can display status and options about submitted G-RFT requests, abort a data transfer, as well as resubmit again G-RFT requests; - Metadata Manager: it provides basic functionalities to (i) annotate files, that is to publish and manage metadata at the GRelC DAS side, (ii) display metadata information about files, folders, workspaces and schema, (iii) manage metadata schema modifying the list of elements associated to the stored objects and (iv) query metadata information in order to retrieve a list of objects satisfying conjunctive search conditions (basic digital libraries capabilities).

Figure 2. Detailed view of WebGRelC architecture 3.3 Security Issues

Figure 1. WebGRelC architecture - File Manager: it provides support for (i) basic files transfer operations such as download and upload of files, (ii) workspace browsing, (iii) displaying of file and folder metadata as well as

To login into the WGP the user must supply the correct username and password. The portal security model includes the use of HTTPS protocol for secure communication with the client browser and secure cookies to establish and maintain user sessions. Moreover, we decided to adopt the Globus Toolkit Grid Security Infrastructure (GSI), (a solution widely accepted

and used in several grid projects) in order to perform security tasks within the data grid environment: - mutual authentication between WGP and MyProxy servers DAS or DSS; - communication protection (by means of data cryptography) for the data exchanged between WGP and MyProxy server, DAS or DSS; - delegation mechanisms to perform data management tasks on the grid. Finally SSL support is provided when the WGP Profile Manager interacts with the PMC. 3.4 Metadata Metadata management is crucial within such a data grid system. To aid scientists in discovering interesting files within collections of thousands or millions of files the proposed architecture besides basic data management facilities must provide metadata publication and semantic search capabilities. Currently, metadata management concerns two different types of metadata: internal (or low level) and application oriented (or high level). In the former case, metadata is related to the physical stored object (for example creation date, file owner, size, etc.) and is system defined, so users cannot modify neither metadata attributes (schema) nor their values (instances). This kind of metadata is managed by the GRelC DSS. In the latter case, metadata is application specific and related schema can vary depending on the particular context (in this case, through the portal, authorized users can annotate both files and workspaces adding, deleting and updating application level metadata and related metadata schema). Application oriented metadata are stored within the GRelC DAS. Finally, to provide a basic semantic search capability, within the WGP a simple search form has been provided. 4. WEBGRELC IMPLEMENTATION In the following subsections, we will describe the WebGRelC implementation, discussing involved grid portal technologies, grid middleware, GRelC and GRB libraries. 4.1 Grid Portal Technologies The WebGRelC grid portal has been developed using a Model View Controller design pattern. To efficiently address performance and modularity we adopted the Fast CGI technology leveraging the Apache Web Server.

Taking into account the main subcomponents of the WGP, within the portal we provided XHTML based web pages to: - support credential delegation and single sign-on to the Grid; - configure grid storage resources and services as well as manage grid portal user profile; - manage DSS users, groups and VOs authorizations; - upload and download files as well as transfer data from a storage resource to another one; - perform activities related to digital libraries (metadata based search engine) accessing the GRelC DAS; - manage metadata and metadata schema related to the objects stored within the storage resources; - submit and manage G-RFT requests. 4.2 Middleware The current version of WebGRelC is based on the Globus Toolkit 4.0.3 (latest stable release as of November 2006) as grid middleware; basically we exploited the Globus GSI libraries. Web service components such as GRelC DSS and DAS strongly exploit the gSOAP Web Services development Toolkit [6]. It offers an XML to C/C++ language binding to ease the development of SOAP/XML Web services in C and C/C++. gSOAP provides a transparent SOAP API using proven compiler technology. These technologies leverage strong typing to map XML schemas to C/C++ definitions. Strong typing provides a greater assurance on content validation of both WSDL schemas and SOAP/XML messages. As a result, SOAP/XML interoperability is achieved with a simple API relieving the user from the burden of WSDL and SOAP details, thus enabling her to concentrate on the application-essential logic. The compiler enables the integration of (legacy) C/C++ software in SOAP applications that share computational resources and information with other Web services, possibly across different platforms, language environments, and disparate organizations located behind firewalls. Finally, to guarantee a secure data communication channel between WGP and DSS or DAS, we utilized the GSI support, available as a gSOAP plug-in [7]. We did not use the Globus Toolkit 4.0.3 C WS Core, which is a C implementation of WSRF (Web Services Resource Framework), because it lacks a usable authorization framework. Even though it is possible to develop WRSF grid services in C (we are already migrating our software to the Globus Toolkit implementation of

WSRF), it is not possible to deploy production level services. Indeed, grid services deployed in the C WS Core container can only use the default SELF authorization scheme (a client will be allowed to use a grid service if the client’s identity is the same as the service’s identity), which is useless for a production service. Unfortunately, the globus-wsc-container program does not have options to handle different authorization schemes. A possible solution could be the development of a customized service that uses the globus_service_engine API functions to run an embedded container, setting the GLOBUS_SOAP_MESSAGE_AUTHZ_METHO D_KEY attribute on the engine to GLOBUS_SOAP_MESSAGE_AUTHZ_NONE to omit the authorization step and then using the client’s distinguished name to perform authorization. However, with the Globus Toolkit 4.0.3 it is not possible for a C grid service to retrieve the distinguished name of a client contacting it, so this is not a viable option and we have to wait for the next stable release of the Globus Toolkit (4.2) that should provide major enhancements to the C WS Core, including a usable authorization framework. 4.3 GRelC Libraries The GRelC libraries mainly address data management activities. Basically the WebGRelC grid portal exploits the following libraries: GRelC SDAI, SDTI, DAS and DSS libraries. The GRelC SDAI library (used within the WGP Profile Manager component) provides a transparent and uniform access to the PMC relational database. It exploits a plug-in based architecture leveraging dynamic libraries. Currently SDAI wrappers are available for PostgreSQL, MySQL, SQLite, Unix ODBC and Oracle DBMSs. An SQLite PMC implementation has a twofold benefit: it provides extreme performance (due to the embedded database management) and it increases service robustness and reliability because it does not depend on an external DBMS server. The GRelC SDTI provides a data management library to transfer files between WGP and DSS (get/put) or couples of DSSs (copy). Basically, it is a C library (leveraging a plug-in based approach), which virtualizes the data, transfer operations providing high level interfaces connected with get/put/copy (parallel and partial). Two basic modules related to GridFTP and HTTPG (HTTP over GSI) are currently available. Further drivers covering additional protocols

(such as FTP, SFTP or SCP) are actively being developed and will be easily added to the system due to the modular design and implementation of this library. This library is used within the WGP File Transfer component. The GRelC DAS client library provides many functionalities related to the DAS component. Among the others, it allows (i) managing metadata, (ii) submitting semantic queries, (iii) browsing metadata, etc. This library is used within the Metadata Manager. Finally, the GRelC DSS client library provides many functionalities related to the interactions with the DSS components. Among the others, it allows (i) managing file transfer and workspaces, (ii) submitting, monitoring and deleting G-RFT requests, (iii) managing users, groups and VOs membership, etc. This library is extensively used within the following WGP components: File Manager and Remote Administration. 4.4 GRB Libraries GRB software is developed within the Grid Resource Broker project at the CACT of the University of Lecce. Currently, GRB Team supplies users with several production libraries mainly connected with (i) job submission, (ii) resource discovery, (iii) credential management and (iv) grid file transfer. For WebGRelC development, we exploited the grb_gridftp [8] and grb_myproxy libraries to respectively transfer data among grid nodes and manage/retrieve user’s credentials. 5. USE CASE: WEBGRELC FOR BIOINFORMATICS DATA The WebGRelC Grid Portal is part of the SPACI [9] middleware (with regard to data management operations) and it is also actively used within the SEPAC [10] production grid. Several GRelC DSSs are now deployed in Europe in the cities of Lecce, Naples, Cosenza, Milan and Zurich managing different storage resources and several tens of thousands of files primarily related to biology experiments. Through the proposed grid portal, bioinformatics can manage and share their workspace areas, annotate files connected with their experiments. Application level metadata can be published and stored within the system, just filling out web forms provided by the grid portal. Search and retrieval operations on metadata allow users finding desired objects within the system

displaying query results on several formats (HTML tables, plain text or XML). More in detail, we defined several workspace areas to store experimental results, structure protein, etc. Bioinformatics workspace contains files produced by the experiments and files that contain the protein structure, retrieved by the UniProt KnowledgeBase (UniProtKB) data bank [11]. The experiment carries out multiple sequence alignment (MSA) among each of the human proteins available in the UniProtKB database (about 70.845 sequences, retrieved by the uniprot_sprot_human.dat and uniprot_trembl_human.dat flat files) and those stored in the UniProt NREF data bank. Homologue sequences are hence matched to identify functional domains. Indeed, multiple alignment is important for studying regions that during the evolution are conserved and that characterize, with a good probability, biological functionality of the sequence. PSI-BLAST (Position Specific Iterative - Blast) [12], available in the NCBI toolkit, has been used for MSA. After running one experiment we produced about 70 thousands alignments files (in XML format) storing them within bioinformatics DSS workspace. The XML Schema Definition of the PSI-Blast output is shown in Fig. 3. Each resulting XML file contains all of the sequences producing significant alignments for a

specified number of iterations (in such an experiment we have considered 2 iterations). Metadata set description of the experiment includes among the others, protein identifier, evalue, score, accession number, etc. Figure 4 illustrates an example query related to the bioinformatics domain.

Figure 3. XSD of the PSI-BLAST output The user can choose a data workspace, and submit a query choosing the output format. A number of files are returned matching the search criteria. The user can then select a file of interest in order to display all of the relevant file metadata. She can also copy, annotate or download these files, as needed.

Figure 4. Semantic Search Engine: example query and results

6. RELATED WORK MySRB [13] is a web-based resource sharing system that allows users to share their scientific data collections with their colleagues. It provides a system where users can organize their files according to logical cataloguing schemes independent of the physical location of the files and associate metadata with these files. MySRB uses the Storage Resource Broker (SRB) [14] and the Metadata Catalog (MCAT) developed at SDSC as its underlying infrastructure. MySRB and WGP share the same goals. The aim of the DataPortal project [15] is to develop the means for a scientist to explore data resources, discover the data they need and retrieve the relevant datasets through one interface independently of the data location. Separate instances of the DataPortal are currently being installed as part of the grid environments of the eMinerals and eMaterials projects. The new web services architecture of the portal has allowed an easy integration with other services such as the CCLRC HPCPortal. In the current system, these are integrated using standard Web protocols, such as web services, http and SOAP, since support for emerging grid technologies, such as OGSA and grid services is not available. 7. CONCLUSIONS AND FUTURE WORK The paper presented an overview of the WebGRelC Grid Portal. We presented the portal architecture, and discussed its implementation. WebGRelC bridges the gap between scientists and their data located in grid environments, providing an effective, production-level, data grid management system. The portal is currently in production in the European SPACI and SEPAC grids. The recently released WSRF based Globus Toolkit will be supported in the near future, as soon as the tools provided will be stable and mature enough for production usage. Future work related to data grid management concerns a complex semantic search engine (based on a P2P federated approach leveraging GRelC DGSs) developed and tested within the SEPAC Production Grid. References

[1]

I. Foster, “Service-Oriented Science, Science”, 308, pp 814-817 [2] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, “The Grid Relational Catalog Project”, Advances in Parallel Computing, “Grid Computing: The

New Frontiers of High Performance Computing”, L. Grandinetti (Ed), pp.129-155, Elsevier, 2005. [3] Foster, I., Kesselmann, C., Tsudik G., Tuecke, S., “A security Architecture for Computational Grids”, Proceedings of 5th ACM Conference on Computer and Communications Security Conference, pp. 83-92, 1998. [4] G. Aloisio, M. Cafaro, G. Carteni, I. Epicoco, S. Fiore, D. Lezzi, M. Mirto , S. Mocavero, “The Grid Resource Broker Portal”, to appear in Concurrency and Computation: Practice and Experience, Special Issue on Grid Computing Environments [5] Basney J, Humphrey M, Welch V, “The MyProxy Online Credential Repository”, Software: Practice and Experience: 2005 (9):801-816 [6] Van Engelen, R.A., Gallivan, K.A, “The gSOAP Toolkit for Web Services and Peer-ToPeer Computing Networks”, Proceedings of IEEE CCGrid Conference, May 2002, Berlin, pp- 128-135 [7] Aloisio G, Cafaro M, Epicoco I, Lezzi D, “The GSI plug-in for gSOAP: Enhanced Security, Performance, and Reliability”, Proceedings of Information Technology Coding and Computing (ITCC 2005), IEEE Press, Volume I, pp. 304-309 [8] G. Aloisio, M. Cafaro, I. Epicoco, “Early experiences with the GrifFTP protocol using the GRB-GSIFTP library”, Future Generation Computer Systems, Volume 18, Number 8 (2002), pp. 1053-1059, Special Issue on Grid Computing: Towards a New Computing Infrastructure, NorthHolland [9] The Italian Southern Partnership for Advanced Computational Infrastructures. http://www.spaci.it [10] The Southern European Partnership for Advanced Computing. http://www.sepac-grid.org [11] A. Bairoch, R. Apweiler, C.H. Wu, W.C. Barker, B. Boeckmann, S. Ferro, and et al, “The Universal Protein Resource (UniProt)”, Nucleic Acids Res., (33):154–159, 2005. http://www.uniprot.org. [12] S.F. Altschul, T.L. Madden, A.A. Schffer, J.Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs”, Nucleic Acids Res., 17(25):33893402, 1997 http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nphpsi. [13] Michael Wan and Reagan Moore, Arcot Rajasekar, “MySRB & SRB Components of a Data Grid”, The 11th International Symposium on High Performance Distributed Computing (HPDC-11) Edinburgh, Scotland, July 24-26, 2002

[14] Chaitanya Baru, Reagan Moore, Arcot Rajasekar, Michael Wan, “The SDSC Storage Resource Broker”, Proc. CASCON'98

Conference, Nov.30-Dec.3, 1998, Toronto, Canada [15] DataPortal http://www.e-science.clrc.ac.uk/ web/projects/dataportal

Workflow Management Through Cobalt Gregor von Laszewski1,2 , Christopher Grubbs3 , Matthew Bone3 , and David Angulo4 1

University of Chicago, Computation Institute, Research Institutes Building #402, 5640 S. Ellis Ave., Chicago, IL 60637 2

Argonne National Laboratory, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, IL 60439

3

Loyola University Chicago, Department of Computer Science,

Lewis Towers, Suite 416, Water Tower Campus, Chicago, Illinois 60611, USA, 4

4 DePaul University, School of Computer Science, Telecommunications and Information Systems, 243 South Wabash Ave, Chicago, Illinois 60604, USA

Contents 1 Introduction: Queueing systems and workflows

1

2 Cobalt: System software for parallel machines

2

3 Karajan: A workflow scripting language 3.1 Java2Karajan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4

4 Qstat Monitor: observing the workflow

5

5 Implications and possibilities 5.1 Extensibility: adding PBS system support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Further research: graphical workflows in the Qstat Monitor . . . . . . . . . . . . . . . . . . .

5 6 6

6 Conclusion

6

7 Availability ‘

6 Abstract

Workflow management is an important part of scientific experiments. A common pattern that scientists are using is based on repetitive job execution on a variety of di↵erent systems, and managing such job execution is necessary for large-scale scientific workflows. The workflow system should also be client-based and able to handle multiple security contexts to allow researchers to take advantage of a diverse array of systems. We have developed, based on the Java Commodity Grid Kit (CoG Kit), a sophisticated and extensible workflow system that seamlessly integrates with the queueing system Cobalt through the advanced features provided by the CoG Kit.

1

Introduction: Queueing systems and workflows

Parallel computing systems o↵er scientists incredibly powerful tools for analyzing and processing information. These supercomputers benefit researchers in many diverse fields, from medicine to physics to social sciences. 1

In order to take advantage of parallel system capabilities, scientists must be able to schedule their jobs with a high degree of confidence, knowing that the jobs will run, the results will be saved, and any errors will be handled. Use of these systems is not, of course, limited to one user; therefore the systems must be able to reliably handle large amounts of tasks from many di↵erent users. This is where queueing system software comes in. Queueing software handles job execution on parallel systems, scheduling the jobs and allocating resources accordingly. While the simplest queueing systems would simply execute jobs on a first-come-first-serve basis, other factors, like the projected running time or number of nodes required, may come into play in more complex systems. By maintaining a user job queue, the queueing software automates the scheduling process such that the system’s resources are utilized to their fullest extent. It also provides an interface for monitoring the current system queue and retrieving information about a job, such as its location in the cluster or grid, how many nodes it is using, how long it has been running, and so on. Some queueing systems also allow server-side specification of dependencies, or workflows, so that users can string together tasks which depend on prior tasks. There are many queueing systems available, in both commercial and open-source implementations. Commercial queueing software packages include Platform’s Load Sharing Facility (LSF) [9], IBM’s Tivoli Workload Scheduler [17], and Altair Engineering’s PBS Pro [11]. An open-source implementation of PBS, called OpenPBS, also exists [10]. Other open-source packages providing job scheduling functionality include the Globus Toolkit [6], Condor [5], Sun’s GridEngine [12], and Cobalt [2], used on Argonne National Laboratory’s BGL cluster system. Furthermore, many portals exist which streamline access to queueing software. TeraGrid’s portal is one example [16]. The TeraGrid Portal is a web-based system, allowing the grid’s users to manage their projects, receive important system-related information, and access documentation, all from within their browsers. As was mentioned before, some queueing systems such as LSF provide workflow functionality; however, such workflows are necessarily tied to one resource. A workflow submitted to one grid system will only be able to utilize the resources provided by that grid system. The addition of a client-side workflow implementation, however, allows access to numerous resources. Through a client-side workflow, a user may access both grid and non-grid systems, working across several di↵erent security contexts. A particularly flexible client-based workflow engine called Karajan is provided in the Java CoG Kit [7], an open-source set of grid tools. This paper will focus on the integration of Karajan with the aforementioned queueing system Cobalt. We chose Cobalt primarily because it is used on BGL at Argonne National Laboratory, where our research is based. Furthermore, because BGL is not a grid system, we wanted to demonstrate that tools normally applied to grid systems could also be applied to non-grid parallel systems. This will demonstrate Karajan’s flexibility in creating robust workflow solutions; the techniques used to integrate Cobalt with Karajan can be employed for other queueing systems as well, as we will show through our additional integration with PBS systems. Before we go into details about Karajan, a brief introduction to Cobalt will be useful.

2

Cobalt: System software for parallel machines

Cobalt is used for handling jobs on BGL, the 1024 node, 2048 processor IBM BlueGene/L system at Argonne National Laboratory. BGL is used primarily for scientific computing, application porting, system software development and scaling studies [1]. Following a “smaller and simpler is better” philosophy, the Cobalt software package trades on feature-richness for agility [3]. While its core implementation is comprised of less than 4000 lines of code, mostly in Python, its component-based architecture makes it a highly adaptable and useful research platform [3]. Researchers can readily add new components or rewrite existing ones. Users may login to BGL via SSH using public key authentication. The cqsub command submits a job to the queue. Like the qsub command found on other queueing systems, cqsub takes a variety of command line arguments, including the desired execution time, the amount of nodes requested, the path to the executable, and so on. Cqsub also takes some specialized arguments, including one for dynamic kernel selection. This feature, which is only found in Cobalt, allows users to test experimental kernels and conduct system software research [3]. It is special features like this that motivated our integration of a specialized Cobalt submission 2

Figure 1: sample caption library with the Karajan workflow engine.

3

Karajan: A workflow scripting language

Karajan is a parallel scripting language which uses a declarative concurrency approach to parallel programming [4]. It is a dual-syntax language, supporting both a native syntax and an XML-based format. For the purposes of this paper, we will use the XML syntax, although there is no underlying di↵erence between the two forms. Basic sequential workflows in Karajan are more or less lists of tasks; the user simply creates the task elements in the proper execution order. To execute tasks in parallel, the user places the tasks in question within a element. Karajan also provides a element, in case the user requires certain tasks within a parallel element to execute sequentially. In addition to its parallel and sequential workflow components, Karajan provides elements for remote task execution, file transfers, a variety of data structures, logical operators, variables, access to GUI forms, Java object and method bindings, and more. Karajan contains a general task execution framework which employs the CoG Kit’s task/provider model. The task/provider model provides a consistent programmatic interface while the underlying provider implemenation (which may be GT2, GT4, SSH, etc.) changes dynamically. This approach is useful in many ways, a↵ording developers a high-level syntax for specifying tasks while leaving actual submission and execution details to the CoG Kit. It also allows developers a simple way to add new protocols without having to change the task implementation; they need only write a new provider. One downside, however, is that it does not allow for special features specific to a certain queueing system, such as Cobalt’s kernel profile selection options. In order to utilize Cobalt’s features within the Karajan framework, we employ Karajan’s Java binding functionality, which allows limited interfacing with Java classes, objects, and methods. Through the Java bindings, we are able to invoke Java code which logs into a Cobalt system, submits a job, and monitors its status. This approach allows us to bypass the CoG Kit abstraction layer while still a↵ording the same workflow capabilities of Karajan’s task/provider model. *Java Binding in Karajan Objects can be instantiated in a Karajan script with the element, provided that the class is present in the CoG Kit’s classpath. 3

The instantiated object’s methods can be accessed with the element; static methods may also be accessed with this element by either setting its “static” attribute to true or by specifying a class name. Creating these bindings can be an tedious task, especially if many di↵erent methods are involved. To automate this process we created the Java2Karajan tool.

3.1

Java2Karajan

The Java2Karajan program takes a compiled Java class file and, using the Java reflection API, constructs a Karajan Java binding. The approach is similar to that taken by SWIG [14], where libraries implemented in a lower level language, in this case Java, are bound and translated to a higher level language, in this case Karajan. Here is an example of an object and method binding generated by Java2Karajan: Here we have a binding for instantiating an Example object, as well as a binding for that object’s method myMethod() which takes an int and a String as parameters. If a script imports this Java binding, it can, for example, instantiate an Example object with the element , then call myMethod(4, “hello”) with the element . If a method returns a value, that value can be placed within a Karajan variable and used elsewhere in the script. * Integrating Cobalt job submission with Karajan workflows In order to take advantage of Java2Karajan for our purposes, we had to first create a set of Java classes for submitting and monitoring the status of jobs on a Cobalt queueing system. The basic sequence of events in a Cobalt submission is as follows: SSH Initialization ! SSH Authentication ! Open SSH session channel ! Execution of cqsub, returning the job’s ID ! Close channel We created a CobaltSubmitter object which utilizes the SSH connection, authentication, and command execution functionality found in the open-source J2SSH SSHTools libraries [13]. First, we instantiate a CobaltSubmitter object, which takes as parameters the hostname, the username, the port, and the path to the user’s private key. In order to avoid a situation where the user would need to enter their passphrase in cleartext, we used a function in the CoG Kit which, upon execution of the Karajan script, masks the passphrase console input with random characters. Assuming all of the authentication information is correct, and that connection and authentication is successful, we are left with an initialized CobaltSubmitter object, through which we can submit jobs using its submit() method. Any command line argument which can be passed to Cobalt’s cqsub command can be given as a parameter in the CobaltSubmitter’s submit() method. This method formulates a cqsub command based on the given parameters and executes it on the server. Cobalt, after adding the job to its queue, responds with a job ID number, which the CobaltSubmitter’s submit() method returns as a string. These two processes, initialization and submission, are essentially all we need to submit jobs to a Cobalt system. However, the CobaltSubmitter object would be useless for workflow purposes without a third function: job status monitoring. Simply submitting jobs in parallel does not preserve dependencies; what we need to do is not only submit the job, but continually monitor its state, moving on to subsequent jobs in the workflow only when the job has finished executing. 4

Cobalt provides the command cqstat for queue monitoring purposes. The CobaltSubmitter object has a method status() which takes the id of a job as a parameter, and executes the cqstat command for that particular job, returning a string representating the job’s state. If the cqstat command returns a blank table, that means that the job in question has completed execution and left the queue, so status() returns the string “finished.” Having written this code in Java, we can now use our Java2Karajan tool to generate our bindings for us. Once we do that, one further step remains: defining a Karajan element, which combines the job submission and job monitoring methods provided by the CobaltSubmitter bindings into a single task element. This element will invoke the submit() method, placing the returned job ID string in a variable. We then enter a loop, utilizing Karajan’s element, in which we repeatedly invoke status() using that returned job ID string as a parameter, exiting the loop when the method returns “finished.” Of course, we wait an arbitrary amount of time between invocations of status() to prevent undue stress on the server. It should be noted that although polling is usually regarded as a dirty word, it is necessary for our purposes, as we can provide no callback function to inform us of the job’s status. While polling is not an ideal solution, the flexibility a↵orded by our approach makes it worth the relatively crude implementation. Having created the element, we now have a simple, convenient way to submit Cobalt jobs from a Karajan workflow. Here is a simple abbreviated example, in which two jobs are executed in parallel, and a third is executed afterward:

4

Qstat Monitor: observing the workflow

As we developed the Cobalt submission classes, we concurrently developed the Qstat Monitor, which is a graphical component displaying the current state of a parallel system. It provides the user with a clean, intuitive interface to a cluster or grid system. Essentially, it parses the output of a qstat command into a Swing JTable. The table is color-coded (running jobs are green, queued ones are yellow) and allows users to customize which fields they would like to view. Jobs which have left the queue remain in the Qstat Monitor’s table, colored blue, providing the user with a record of completed jobs. The monitor, in addition to being of general use for users of BGL and other parallel systems, was specifically relevant for the Cobalt workflow research, because it allowed us to observe the queue’s status throughout the various stages of the workflow execution. We can check if a given job has begun execution or if it is still waiting, and we can monitor other system activity; for example, perhaps the system is particularly backlogged with other users’ jobs, which will slow down our workflow execution. The Qstat Monitor also o↵ers job submission functionality, using the same Cobalt submission classes used in the Karajan workflow integration. Through a dialog box, the user can fill in the relevant information for the job (wall time, amount of nodes, etc.) and submit the job directly through the Qstat Monitor program.

5

Implications and possibilities

Although our integration of a custom queueing system library with Karajan is fairly basic, it has significant implications for researchers interested in maximizing their usage of diverse parallel systems. While Karajan previously supported a heterogeneous workflow, allowing multiple systems and security contexts, we now 5

have the ability to fine-tune our workflows to take advantage of the unique features provided by specific queueing systems. This approach is by no means limited to Cobalt.

5.1

Extensibility: adding PBS system support

As a demonstration of the extensibility of our approach, we developed a set of PBS submission classes which can be integrated with a Karajan workflow in the same way as the Cobalt classes. PBS is a widely used batch queueing system; here at Argonne, it is used on the UC/ANL TeraGrid [15]. We use the same SSH methods as the Cobalt system for these classes. What changes is the structure of the qsub command and the arguments available to the user. Having implemented the methods to generate the qsub command, and with the features provided by our Java2Karajan tool, creating and elements is a fairly straightforward task. This makes possible a Karajan workflow combining Cobalt and PBS elements, as well as Karajan’s existing task submission elements, and potentially any other system for which a library is available. Additionally, we implemented PBS functionality in the Qstat Monitor to demonstrate that component’s extensibility. The Qstat Monitor can switch between the two queueing system modes, providing features specific to each, while the underlying SSH interaction mechanism remains the same.

5.2

Further research: graphical workflows in the Qstat Monitor

One possibility for further research in this area might be the integration of the Karajan workflow with the graphical Qstat Monitor component. Instead of defining workflows in Karajan XML, the Qstat Monitor component could be expanded to allow the graphical creation of workflows, which could then be passed to the Karajan engine. This would combine the customizability o↵ered by the custom queueing system submission libraries and the convenient graphical interface provided by the Qstat Monitor. It may also be possible to integrate the queueing system-specific features described in this paper with the existing Karajan GUI application [8].

6

Conclusion

We have demonstrated an e↵ective way to extend Karajan to take advantage of the unique features provided the Cobalt queueing system, further improving Karajan’s client-based heterogeneous workflow capabilities. In doing so, we have also demonstrated a general methodology of augmenting Karajan through its Java binding functionality. Furthermore, our Karajan implementation works without installing any grid software, and authenticates through standard SSH public key authentication. As part of the CoG Kit, this research expands the commodity vision, in which we integrate grid resources with non-grid resources.

7

Availability

The Java CoG Kit is available through its homepage at http://wiki.cogkit.org. A Java Web Start release of the Qstat Monitor is available at http://wiki.cogkit.org/index.php/Java_CoG_Kit_Qstat. Code referred to in this paper can be found in the qstat section of the CoG Kit’s Subversion repository, viewable at http://svn.sourceforge.net/viewvc/cogkit/trunk/five/qstat/. Instructions for downloading the repository to Eclipse, using the Maven build system, can be found at http://wiki.cogkit. org/index.php/MavenRepository.

Acknowledgments The REU was supported by NSF REU site grant 0353989. The submitted manuscript has been created by the University of Chicago as Operator of Argonne National Laboratory (”Argonne”) under Contract No. 6

W-31-109-ENG-38 with the U.S. Department of Energy. The U.S. Government retains for itself, and others acting on its behalf, a paid-up, nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.

References [1] Argonne National Laboratory BG/L System. Web Page. Available from: http://www.bgl.mcs.anl. gov/. [2] Cobalt: System Software for Parallel Machines. Web Page. Available from: http://www-unix.mcs. anl.gov/cobalt/. [3] Cobalt: An Open Source Platform for HPC System Software Research. Web Page. Available from: http://www-unix.mcs.anl.gov/cobalt/Cobalt-epcc-10-05.pdf. [4] Java CoG Kit Karajan Workflow Reference Manual. Web Page. Available from: http://wiki.cogkit. org/index.php/Java_CoG_Kit_Karajan_Workflow_Reference_Manual. [5] Condor: High Throughput Computing. condor/.

Web Page.

Available from: http://www.cs.wisc.edu/

[6] The Globus Toolkit. Web Page. Available from: http://www.globus.org. [7] Java Commodity Grid (CoG) Kit. Web Page. Available from: http://www.cogkit.org. [8] Java CoG Kit - Karajan GUI. Web Page. Available from: http://www.cogkit.org/release/4_1_2/ /webstart/#karajan-cog-workflow-gui. [9] Load Sharing Facility. Web Page, Platform Computing, Inc. Available from: http://www.platform. com/. [10] Portable Batch System. Web Page, Veridian Systems. Available from: http://www.openpbs.org/. [11] Altair PBS Professional. Web Page, Altair Engineering. Available from: http://www.altair.com/ software/pbspro.htm. [12] gridengine: Home. Available from: http://gridengine.sunsource.net. [13] J2SSH SSHTools. Web Page. Available from: http://sourceforge.net/projects/sshtools. [14] Simplified Wrapper and Interface Generator (SWIG). Web Page. Available from: http://www.swig. org/. [15] TeraGrid. Web Page, 2001. Available from: http://www.teragrid.org/. [16] TeraGrid Portal. Web Page. Available from: http://www.teragrid.org/userinfo/portal.php. [17] IBM Tivoli Workload Scheduler. Web Page. Available from: http://www-306.ibm.com/software/ sysmgmt/products/support/IBMTivoliWorkloadScheduler.html.

7

Workflow-level Parameter Study Management in multi-Grid environments by the PGRADE Grid portal1 Peter Kacsuk, Zoltan Farkas, Gergely Sipos, Adrian Toth, Gabor Hermann MTA SZTAKI, 1111 Kende utca 13 Budapest, Hungary e-mail: {kacsuk, farkas, sipos, toth.adrian, gabor.hermann}@sztaki.hu

Abstract Workflow applications are frequently used in many production Grids. There is a natural need to run the same workflow with many different parameter sets. Unfortunately current Grid portals either do not support this kind of applications or give only specialized support and hence users are obliged to do all the tedious work needed to manage such parameter study applications. P-GRADE portal has been providing a high-level, graphical workflow development and execution environment for various Grids (EGEE, UK NGS, GIN VO, OSG, TeraGrid, etc.) built on second and third generation Grid technologies (GT2, LCG-2, GT4, gLite). Feedback from the user communities of the portal showed that parameter study support is highly needed and hence the next release of the portal will support the workflow-level parameter study applications. The current paper describes the semantics and implementation principles of managing and executing workflows as parameter studies. Two algorithms are described in detail. The black box algorithm optimizes the usage of storage resources while the PS-labeling algorithm minimizes the load of Grid processing resources. Special emphasis is on the concurrent management of large number of files and jobs in the portal and in the Grids as well as providing a user-friendly, easy-to-use graphical environment to define the workflows and monitor their parametric study execution.

1. Introduction One of the most promising utilizations of Grid resources comes to life with parameter study (or sometimes written as “parametric study” or “parameter sweep”) applications where the same application should be executed with a large set of input parameters. Such parameter study applications are easy to implement in the Grid since the different executions started with different parameters are completely independent. Indeed, there are several projects [1], [2], [3] that demonstrated that parameter study applications are easily manageable in the Grid. However, most of these projects tackled only single job based applications. The real challenge comes when complex applications consisting of large number of jobs/services connected into a workflow should be executed with many different parameter sets. There have been only two projects that tried to combine parameter studies with workflow-level support in the Grid. ILab [4], [5], [6] enables the user to create a special parameter study oriented workflow. With the help of a sophisticated GUI, the user can explicitly define how to distribute and replicate the parameter files in the Grid and how many independent jobs are to be launched for each segment of the data files. This approach is very static restricting the exploitation of the dynamic nature of Grids that enables the dynamic collection of resources. The SEGL [7] approach puts much more emphasis on exploring the dynamic nature of the Grid. They also provide a GUI to define the workflows and to hide the low level details of the underlying Grid. The SEGL workflow provides tools for several levels of parameterization, repeated processing, data archiving, handling conclusions and branches during the processing as well as synchronization of parallel branches and processes. The problem with this GUI is that it might be too sophisticated, requiring very large skill from the application developer. Furthermore both ILab and SEGL are connected to a particular Grid although in case of a parameter study execution there is a large need to exploit as many resources as possible even if they should be collected from different Grids. Finally, neither of them can be used as a service through a Grid portal.

1

This research work is carried out under the FP6 Network of Excellence CoreGRID funded by the European Commission (Contract IST-2002-004265) and under the SEE-GRID-2 project funded by the European Commission (Contract number: 031775).

Although our approach to support workflow-level parameter study applications in the Grid has many similarities with these two projects, there are significant differences, too. Our main goals are as follows: 1. 2. 3. 4. 5.

Keep both the workflow GUI and the parameter study support concept as simple as possible. This enables the fast and easy learning of the tool as well as its easy usage. Enable run any existing workflow with different parameter sets without modifying the structure of the workflow. Manage the execution of the workflows on as many Grid resources as possible. Enable the collection of Grid resources from several Grids even if they are based on different Grid technology. Enable the access of the workflow-oriented GUI and the available Grids via a single Grid entry point, i.e., via a Grid portal without installing any software on the user’s machine. Provide a dynamic balance between the usage of processing resources, storage resources and network resources.

Clearly, these goals differ from the main concept of the above mentioned two projects. The last goal can be found in several other parameter study projects that aim at the support of parameter study applications at the individual job level. In case of Nimrod/G [3] and Apples/APST [8] the main emphasis is on scheduling and hence in these projects the fifth goal of our project is dominant. In Nimrod even an economic model is considered during resource scheduling. The starting point for our project was the P-GRADE Grid portal [9] that provides a workflow-oriented GUI as well as workflow-level interoperability between various Grids even if they are built on different Grid technologies. This means that the same portal can be connected to several different Grids and the portal manages the workflow execution among these Grids according to the users’ requirement [10]. The portal even enables the parallel exploitation of the connected Grids, i.e., different jobs of the same workflow can simultaneously be executed on Grid resources taken from different Grids. Such a multi-Grid workflow execution mechanism is a unique feature of the P-GRADE portal that is now widely used for many different Grids (SEE-GRID, VOCE, EGrid, HunGrid, CroGrid, GILDA, etc.). Besides LCG-2 and gLite based production Grids the portal is successfully used as service for the GT2 based UK National Grid Service (NGS) and it was also successfully connected to the GT4 based Westfocus Grid (UK). Recently the GIN (Grid Interoperation/Interoperability Now) VO of OGF is supported by the portal enabling the simultaneous access to all of its resources coming from different Grids. This portal is also connected to US OSG and TeraGrid, UK NGS, UK WestFocus Grid, EGEE Grids and hence via this portal all the major Grids of the US and Europe can be accessed even from the same workflow. Experiences of the portal revealed that many applications require not only the single execution of a workflow rather they seek for parameter study support to execute an existing workflow application with many different parameter sets. Therefore, our motivation was to extend the existing single workflow support of the portal towards a generic workflow-level parameter study execution support. Such support should enable the automatic starting, execution, monitoring and visualization of all the workflows belonging to the same parameter study. Of course the same way as in the case of the single workflow management environment the users should neither know any details of the underlying Grids nor insist on any particular programming language. Even legacy codes can be used as services in the workflows if the portal is integrated with the GEMLCA legacy code architecture service [11]. In order to reach the five main goals mentioned above we have developed two algorithms to manage workflow-level parameter studies across multiple Grids. The first algorithm is based on the “black box” concept that minimizes the required storage resources of the portal but results in superfluous job executions in many cases. The second algorithm, called as the PS-labeling algorithm minimizes the usage of processing resources of the Grids but increases the storage needs of the portal. They represent two extreme points in the scale of possible execution management algorithms. The algorithms should be mixed according to a dynamic and adaptive scheduling algorithm that is the subject of our future research. The paper first introduces the workflow concept of P-GRADE portal in Section 2. The next section explains the “black box” execution semantics and its portal support both at the user interface and the portal

workflow manager level. Section 4 introduces the PS-labeling algorithm and compares it with the “black box” execution mechanism. Section 5 summarizes the various problems that should be tackled in a parameter study environment (resilience, multi-Grid execution, large number of processes and workflows, dynamic changing of input parameter set, etc.). Finally, Section 6 compares our research with related ones. 2. The workflow concept of P-GRADE portal When designing and implementing P-GRADE portal, we focused on the following basic principles. Simplicity and expressiveness of the user interface P-GRADE portal provides a graphical editor to develop workflow applications. The workflow graph is a simple DAG (Directed Acyclic Graph) where nodes of the graph are either jobs (sequential, MPI or PVM) or legacy code services (if GEMLCA is integrated with the portal). For each node input and output ports can be defined. Input ports represent the input files to be used by the node and output ports represent output files to be generated by the node. Input and output ports can be connected by directed arcs (from the input to the output port). The arcs represent the necessary file transfer between the two connected nodes. Indeed, the portal run-time system automatically takes care of these predefined file transfers, so the user is completely released of transferring files among jobs during the workflow execution. A node can be executed if all the input arcs have received the necessary input files. This very simple dataflow semantics enable the parallel execution of those nodes that are simultaneously supplied with the required input files. Notice that based on this semantics the user can easily achieve two-level parallelism. The first level is among nodes of parallel branches of the workflow (inter-node parallelism), the second level is within a node if the job executable defined for the node is a parallel code (MPI or PVM). This second level is called intra-node parallelism. To keep the concept as simple as possible neither if-then-else nor loop constructs are provided in the graph. (However, based on the existing graph facilities, the user can easily create ifthen-else graph templates [12]). In order to illustrate the graphical workflow concept of P-GRADE portal Figure 1 shows a simple example workflow that is used for solving the Ax = B equation where A is a matrix, B and X are vectors. The application consists of 5 jobs (all of them having sequential executable code). The first job called as “Separator” accepts the A and B matrices as input parameters on its input port, separates them and then copies A to jobs “Invert_A” and “A_mul_X”, and copies B to “Multip_B” and “Subtr-B”. The job “Invert_A” creates the invert matrix of A that is multiplied by B in job “Multip_B”. The output of “Multip_B” is the searched X vector. The next two jobs are used to check the quality of the result.

Figure 1 Workflow for Computing the Ax=B equation

Exploiting the available services of existing Grids Instead of developing a new kind of middleware as it has been done in several other parameter study related Grid projects [3], [8] our aim was to exploit the available services of existing production Grids. For this reason we have not developed our own enactment service rather we use Condor DAGMan [13]. If there are alternative services, we use them in order to ensure Grid interoperability through the portal. For example, the portal can access, process and visualize the information in both MDS and BDII type information systems. However, the portal should work even if a specific service is not available. For example, the portal can exploit the EGEE broker services to assign jobs to Grid resources but it also can manage GT2 Grids where such broker does not exist (e.g. the UK NGS). In the former case the portal automatically generates a default JDL file based on the user created workflow definition and enables the user to tune this JDL file with additional visual editing. In the latter case, the portal enables the user explicitly assigning jobs to Grid resources. In both cases, first the user should select the Grid where the job should be executed. For different jobs of the workflow the user can select different Grids either with broker or without broker. Tailoring the Grids to user needs The portal administrator can connect the portal to several Grids. The portal maintains a list of connected Grids common for every users. This list cannot be modified by the users only by the portal administrator. For each connected Grid a list of available resources is provided for every single user. The content of these Grid resource lists can be modified by the user. Grids usually have many unreliable sites and the user can remove from his Grid resource list those sites that he identifies as unreliable ones. On the way round, if he knows some extra sites not shown by the Grid resource list he can easily add those sites to his own Grid resource list. If later the user would like to assign a job to a Grid resource, the portal offers the available Grid resources according to these Grid resource lists. Flexibility, expandability, standardization In order to provide a flexible, easy to expand and standard portal we selected the Gridsphere portal framework technology [14] and used it to build our portal. Gridsphere is used by a large community enabling the exchange of available portlets that are written according to the Gridsphere standard. It also enables the easy tailoring of the portal to specific user needs. Our aim with the P-GRADE portal was to create a workflow-level core portal that can easily be extended and adapted for different user communities’ needs. 3. The “black box” execution semantics of workflow-level parameter study execution The simplest approach of supporting parameter studies at the workflow level is based on the “black box” execution semantics. It means that we consider a workflow as a black box that should be executed with many different parameter sets. These parameter sets are placed on the so-called PS input ports of the workflow. An input port is called PS input port if a set of parameter files can be received on that port. If a workflow has one such PS input port, it should be executed as many times as many elements are in the parameter file set of that port. If there are several PS input ports, the workflow should be executed according to the cross-product of these input sets. From now on we say that a workflow (WF) which has got at least one PS input port is called a PS workflow (PS-WF). The concept of the workflow-level PS support based on the “black box” execution semantics is illustrated in Figure 2. The original WF consisting of 4 jobs is considered as a black box that have two input ports capable of accepting inputs from the outside world. On both inputs the user can provide several input sets and the workflow should be executed with the cross-product of these input sets. Figure 2 shows a case when Input port 0 accepts an input set with two elements (value 1 and 2) and Input port 1 accepts an input set with three elements (value 3, 4 and 5). Therefore, the workflow should be executed six times as shown in Figure 2. In order to manage the execution of workflows according to the “black box” execution semantics the workflow manager of the portal was extended in the following way. Let M = N1 x N2 x…x Nm

where m is the number of PS input ports and Ni denotes the number of input files on the i-th PS input port. At run-time the portal PS-WF manager generates M executable workflows (e-WFs) from the original PSWF. Every e-WF is labeled by m labels: 1 - n1, 2 - n2 , … m - nm Input values: 1, 2 IDs: 0, 1

PS-port 0 Job_3 SEQ 1

Input values: 3, 4, 5 IDs: 0, 1, 2

0

0

Job_4

Job_5

SEQ

MPI

2

1

1 PS-port

0

1

Job_6 MPI 2 Workflow is a black box 2, 1

3, 4, 5 0

1

WF

Generate cross product 1

3

2

3

1

4

2

4

1

5

2

5

0

1

0

1

0

1

0

1

0

1

0

1

WF

WF

WF

WF

WF

WF

Apply cross product for the original workflow

1

2 0

1

2 0

J ob_ 3

0

2 0

0

J ob_ 3 J ob_ 3

P a ra m e tric

J ob_ 3

J ob_ 3

J ob_ 3

P a ra m e tric

SEQ

SEQ

P a ra m e tric

P a ra m e tric

SEQ 1

1

0

0

P a ra m e tric SEQ

SEQ

1

J ob_ 4

J ob_ 5

1 P a ra m e tric

P a ra m e tric

SEQ

MP I

2

1

1

3

J ob_ 4

0

J ob_ 6

0

0

0

J ob_ 4

P a ra m e tric

4

P a ra m e tric

SEQ

MP I

2

1

0

J ob_ 5

P a ra m e tric

P a ra m e tric

SEQ

MP I

2

J ob_ 4

4

1

1

J ob_ 5

1 P a ra m e tric

P a ra m e tric

SEQ

5

MP I

2

1

J ob_ 4

J ob_ 5

1 P a ra m e tric

P a ra m e tric

SEQ

MP I

2

1

1

J ob_ 6

P a ra m e tric

0

1

J ob_ 6

P a ra m e tric

MP I

P a ra m e tric

P a ra m e tric

SEQ

MP I

0

1

2

1

0

1

J ob_ 6

J ob_ 6

P a ra m e tric

P a ra m e tric

MP I

MP I

MP I

2 2

0-0_1-0

J ob_ 5

1

J ob_ 6

MP I

2

0

J ob_ 4

5

1

P a ra m e tric

MP I

0

0

J ob_ 5

1

0

P a ra m e tric

1

1

0 0

0

P a ra m e tric

SEQ

1

0

3

1

0

0-1_1-0

0-0_1-1

2

0-1_1-1

Figure 2. Concept of “black box” execution semantics

2

0-0_1-2

2

0-1_1-2

Where the internal structure of labeli is: i - ni where i identifies the i-th input PS port and ni represents the ordering number of the input file taken from the i-th PS port in the identified execution instance: 0 ≤ ni < Ni Figure 2 also shows these labels for each of the 6 execution instances. This labeling scheme identifies for the PS-WF manager which input file to take from the different PS input ports in the case of different execution instances (e-WFs). It also helps in identifying the output files generated at the output ports. Every output file is labeled with the label of the e-WF that generated it. Notice that output files can be local and remote. Remote output files are always permanent and once they are produced by an e-WF they can be immediately read by the user. This enables the user to study the partial results even if an e-WF is not completed. Local output files can be permanent or volatile. Permanent means that the user would like to get access to this output file only when the whole e-WF execution has been completed. These partial results are collected and stored by the portal meanwhile the e-WF runs. When the e-WF is completed these files are zipped (together with the standard output and error logs) and placed by the portal to a Grid storage resource that was defined by the user. These local permanent files should be typically small files and collecting and storing them by the portal the number of access to Grid storage resources can be significantly reduced resulting in the reduction of the overall execution time. Finally, local volatile files represent temporary partial results. As they are consumed by the connected job(s) they can be removed from the portal. This is important from the point of view of reducing the load of the portal storage resources. Developing and running PS-WF applications according to the “black box” execution semantics require three main steps. The first step is the development of the WF application that the user would like to run as a PS-WF application. The development process of a WF application has been described in previous publications [9] as well as in the User’s Manual [15] of the current service portals and hence we do not describe it in this paper. The basics concepts are summarized in Section 2. The second step is the transformation of the WF into a PS-WF. Finally, the third step includes the submission of the PS-WF application to the Grid and monitoring the execution of the PS-WF application. Consequently, the PS-WF user interface has two major parts: 1. 2.

Definition of PS-WF graphs Monitoring the PS-WF graph execution

3.1 PS-WF graph definition In order to turn a WF application into a PS-WF application the graphical Workflow Editor (WE) of the portal was slightly extended. The user can open the existing WF by the WE and can turn any of the existing input port into a PS input port. In order to illustrate the process we use the same Ax_EQUAL_B workflow application that was shown in Section 2. The task is to modify this WF in order to solve the equation for a set of A and B parameters. Figure 3 shows how to turn the input port of the Separator job into a PS port. Notice the difference between the input port and PS input port definition. In case of a normal input port a file is associated with the port. This file can be either local (originating from the user’s machine and part of the input sandbox) or remote (placed in a storage resource of the Grid). In case of a PS port (Figure 3) a directory is associated with the port. This directory always should be placed in a storage resource of the Grid. The user should place the series of input files into this directory that must not contain any other file. Currently, the portal does not provide any portlet to support the placing of the input files in to the selected Grid storage resource. The user should use the command line interface of the actual Grid. E.g., in EGEE Grids, there are file catalog related commands by which the user can place the files in such a directory. After defining the PS input ports the user should identify the Grid and its storage resource where the local permanent files should be stored at the end of each e-WF execution. As a summary we can say that turning an existing WF into a PS-WF is an extremely easy task. Simply turn some of the input ports to PS input ports and define the target Grid storage resource for the local permanent files. This was exactly our aim: to simplify for the user the process of utilizing existing workflows and run them as parameter studies.

Figure 3 Definition of a PS input port 3.2 Monitoring the PS-WF graph execution Even monitoring a single job is important for the user not mentioning when he runs thousands of jobs as part of a PS-WF execution. The challenge here is how to visualize the execution status of thousands of eWFs and jobs in an easily understandable and manageable way. Again the monitoring of a single WF as it was managed by the portal was a good starting point. It was only slightly extended and yet it can help to monitor all the e-WFs and all their jobs in an efficient way. The ordinary WFs of a user are listed by the portal under the Workflow Manager window. Here the user can submit the WF, attach the WF to the Workflow Editor to see the graphical view of the WF, and delete the WF. Moreover, the “Details” button enables the user to see the details of the WF, i.e., to see the component jobs and their assignment to Grid resources. The PS-WFs are listed in the Workflow Manager window in the same way as the WFs. The only difference is that the PS-WFs have a “PS Details” button to show their internal details. Figure 4 shows a snapshot of the Workflow Manager window where the original “Ax_EQUAL_B_seegrid_broker” WF and its PS-WF version (“Ax_EQUAL_B_PS”) are listed before submitting the PS-WF.

Figure 4 Workflow Manager before submit the PS-WF

Once the user submits the PS-WF (simply clicking on the corresponding “Submit” button), the portal workflow manager (WM) creates all the e-WFs that are defined by the cross-product of the PS input ports’ file sets. Then WM submits simultaneously as many e-WFs as many are permitted by the portal administrator. In principle all the e-WFs could be submitted into the Grid. However, every e-WF submission requires a significant resource set both from the portal server and from the underlying Grid. In order to prevent the flooding of the portal and the Grid by the e-WFs and jobs of a single user, the portal administrator can restrict the number of e-WFs that can simultaneously be submitted to the Grid. After the PS-WF submission using the PS Details button the user can see the statistics of the e-WFs: how many were initiated, submitted, finished and how many went on Rescue or Error. Figure 5 shows the situation where the “Ax_EQUAL_B_PS” PS-WF was submitted with 6 input parameter sets. As a result 6 e-WFs were generated by the portal. 2 of them already finished, 3 submitted and one still in init state, i.e. waiting for submission. The figure also shows that any submitted e-WF can be viewed in detail by using the “Details” button. Clicking there the detailed view of the e-WF shows the component jobs of the e-WF, their Grid resource assignment and their status. Notice that any e-WF can be aborted. It kills the selected e-WF but the other e-WFs can continue their activity. Figure 5 also reveals that those e-WFs that are finished cannot be viewed in the PS Workflow Details window. Their results (including every stdout and stderr files) are already stored in the defined Grid storage resource so the user can check those files there.

Figure 5 Detailed monitoring view of a PS-WF’s execution 4. PS-labeling algorithm Although the “black box” execution semantics is easy to understand and apply unfortunately, it is not optimal execution semantics. In the case of the example shown in Figure 2 the whole PS-WF as a black box should be executed 2x3=6 times, i.e., all the jobs of the PS-WF will be executed 6 times. However, analyzing the PS-WF it quickly turns out that it is enough to execute job3 and job5 with the two input values of PS input port 0. Job4 and Job6 indeed should be executed with the cross-product of the two PS input ports, i.e. 6 times. This example shows that the “black box” execution semantics results in a redundant execution that is not tolerable if the number of files on the PS input ports are in the range of hundreds and thousands as it frequently happens during large-scale scientific simulations. In order to be able to solve the problem illustrated by Figure 2 we have to modify the “black box” execution mechanism. The new method is based on the “black box” approach in the sense that the user should not have to modify anything related to an existing WF, except for simply turning the required input ports into PS input ports. Once this is done the portal will handle the WF as a PS-WF and provides a unique natural number as PS port identifier for the PS input ports. Of course, the user should place the necessary number of input files for every PS input port. Based on the PS port identifier and the number of input files placed on the PS ports, the portal can identify for each node of the PS-WF graph the optimal execution

number. This is done by the PS-labeling algorithm run by the workflow manager of the portal. The PSlabeling algorithm has two phases: -

Preparation phase Execution phase

Preparation phase The preparation phase is executed by the workflow manager before submitting a PS-WF. Starting from each node having a PS input port represented by the label i-Ni (where i denotes a unique natural number index of the PS port and Ni denotes the number of input files belonging to this port) the algorithms extends the current label of all dependent nodes by i-Ni. (A node j is dependent from node i, if there is a directed path of arcs starting from node i and ending at node j.) At the end of the algorithm every node is either without any label or labeled with a series of labels: label1, label2, … labeln The former case means that the node does not depend on any PS input ports and hence it should be executed only once. The latter case means that the node was on at least n paths that are rooted from n different input PS ports’ node. Overall such a node should be executed N1 x N2 x…x Nn times as the cross-product of the initial input file sets. This label set is extended to be a full label set for every node. A full label set is an ordered set of the Ni values, i.e., on position i there is the value of Ni. If there was no label for position j (PS port j) in the generated label set, then an empty position is placed here. The full label set for a given node shows those PS input ports that have impact on this node and for such PS ports it gives the number of input files demonstrating the strength of this impact. Example: let us suppose that M=4, N0= 4, N1=3, N2=5 and N3=2 and the generated label set for node A dependent on PS port 1 and PS port 3 is: 1-3, 3-2. Then the full label set for A will be: {_, 3, _, 2}. After completing the labeling, for each node a power set of the full label set is generated. For node A in the example above the power set will look like: {_,1,_,1; _,1,_,2; _,2,_,1; _,2,_2; _,3,_,1; _,3,_,2 } Execution phase After completing the preparation phase, the workflow manager begins to generate and submit the e-WFs labeled as n1, n2 , … nm where i identifies the i-th input PS port and ni represent the ordering number of the input file taken from the i-th PS port in the identified execution instance: 0 ≤ ni < Ni During the e-WF execution the workflow manager tries to substitute the execution of the node by the file(s) resulting from a previous run of the node. First the e-WF label is matched with the power set of the nodes of the PS-WF. If there is no power set of a node it means that it has to be executed only once and marked as executed for later e-WFs. If a node has a power set and the matching element is marked for the node, then this node was already executed and therefore the workflow manager does not submit the job (node) again, rather immediately provides those output files that were generated by this node during the marked run. If the matching element is not marked for a node, then this node should be executed. After executing the node

its matching element is marked and the result output files are stored either in a Grid storage resource if they were defined as remote files or inside the portal if they were defined as local files. To illustrate the algorithm let us take the example shown at the preparation phase (M=4, N0= 4, N1=3, N2=5 and N3=2) and consider the e-WF with label “3,2,4,1”. If node A has the power set {_,1,_,1; _,1,_,2; _,2,_,1; _,2,_2; _,3,_,1; _,3,_,2 } and the matching element _,2,_,1 is not marked yet, the job or service belonging to node A should be executed. Once it is executed the matching element _,2,_,1 should be marked. If later another e-WF for example with label “1,2,3,1” is to be executed, then the workflow manager will recognize that the matching element _,2,_,1 is already marked and hence node A must not be executed again. Notice that the price for this optimized usage of processing resources is to increase the storage capacity requirements of the portal server. There are three types of files handled by the portal in the PS-WF: 1.

2.

3.

Local volatile files: During normal WF execution these files are immediately removed from the portal when they are consumed by their connected input ports. The same is true in case of the black box PS-WF algorithm. However, in the PS-labeling PS-WF execution algorithm they should be stored in the portal in order to substitute by them the re-execution of their producer node. They should be stored as long as there is an e-WF that have not completed yet and must use these files. Local permanent files: During normal WF execution these files are stored in the portal and delivered to the user after completing the WF execution. In case of the black box PS-WF algorithm they must be stored until the corresponding e-WF is not completed yet. After that they can be removed without waiting the completion of other e-WFs. In case of the PS-labeling PS-WF algorithm they must be stored as long as there is an e-WF that have not been completed yet and must use these files. So their average storing time is much longer than in case of the black box PSWF algorithm. Remote output files: They should be stored in Grid storage resources and hence these files do not increase the storage requirements of the portal either in the case of the black box or the PSlabeling algorithm.

As a conclusion we can say that the more internal output files are defined as remote files the less storage price we have to pay for the optimized usage of processing resources. Since remote files also help in online checking and monitoring the execution of PS-WFs, it is recommended to use remote files instead of local files in case of PS-WFs. On the other hand remote files restrict the distribution of the execution of PSWF nodes in different Grids. Usually, a node should be executed in the same Grid where the associated input/output remote file is stored. However, since local files are stored on the portal their processing PSWF nodes can be assigned to any Grid connected to the portal (provided that the user has certificate and VO membership for the given Grid and VO). Therefore if a user would like to exploit many Grids for the execution of a PS-WF it is recommended to use local files instead of remote files. In this case the choice of Grids and Grid resources can be done on-the-fly by the portal provided that it is supported by a metabroker. To develop the optimal Grid selection algorithm and developing a meta-broker is the subject of future research [16]. Notice that from the user interface point of view the PS-labeling PS-WF execution does not differ in any way from the black box PS-WF execution. Therefore it is the task of the portal to decide which algorithm is to be used for a given PS-WF. Such decision will be the subject of future research. The decision algorithm can be even more fine-grain, i.e., the portal could dynamically change the applied PS-WF execution algorithm node by node according to the size of the generated output files and the execution time of the different nodes. For example, if a node runs very quickly but generates a large local file, it would be better to execute it according to the black box algorithm. 5. Further aspects of PS-WF execution Fault-tolerance in case of PS-WF execution has an outstanding importance since a PS-WF typically uses many Grid resources for a long period of time in order to execute all the e-WFs derived from the PS-WF. The fault tolerant execution of PS-WF can be supported in many ways by the portal. First of all, any job of

the PS-WF that is assigned to a Grid broker will be resubmitted if the broker rejects its execution on a selected Grid site. If a WF node is assigned by the user to a dedicated Grid resource and that resource is failed or after three attempts the Grid broker still fails to run a node, the particular node and e-WF will have a RESCUE status and the portal sends an e-mail message to the user (if the user requested it). The e-WF will go into rescue mode according to the Condor DAGMan control. The user can re-assign the failed node and resume the execution of the e-WF from the rescue state. Sending e-mail to the user has a great importance since a PS-WF application can run for days or even for weeks and the user does not want to pay continuous attention to the execution status. However, there are situations where the user’s intervention is needed for the portal or the user is interested in accessing the partial results of the execution. The user can define such situations for the portal and whenever such a predefined situation occurs the portal sends an e-mail to the user. One such situation is when a job failed and its e-WF turned into rescue state. Another important situation is when the user’s certificate is getting close to the expiration. The user can ask for e-mails after completing every e-WFs or completing the whole PSWF. A portal can be used by many users and many of them can submit PS-WFs. This can result in an overload of either the portal or the connected Grid or both. In order to avoid such harmful situation, the portal administrator can set up certain limits for the number of workflows that can be submitted by a single user. If this number N is less than the number of generated e-WFs of a submitted PS-WF (M), then the portal simultaneously submits N e-WFs out of the potential M generated e-WFs. Whenever an e-WF is finished the portal submit another e-WF until all the M e-WFs are submitted. However, this restriction does not protect other portal users. In order to evenly distribute the portal and Grid resources among the different users, the portal administrator can also set up a certain limit for the number of jobs that can simultaneously be submitted by the portal into the Grid. This limit is applied for all the users of the portal and hence the portal tries to equally distribute the available job submissions among the actual users of the portal. P-GRADE portal is a multi-Grid portal as it was explained in the Introduction. As such it can distribute eWFs for all the connected Grids. The question is how to select the right Grid and the best performing Grid resource for a given e-WF or for a given node of an e-WF. The current solution is largely controlled by the user. He can directly assign a node to a certain Grid and Grid resource in the original WF. In this case all the e-WFs generated from the PS-WF created from this WF will assign the job to this Grid site. This solution must be used in Grids where broker is not available. Allocating different Grids and different resources in the selected Grids the user can achieve a static distribution of e-WF nodes among the connected Grids. The situation can be improved by assigning nodes to brokers in Grids where brokers are available. In this case the user statically defines in advance which Grid to be used for the execution of a node but gives the freedom to the Grid broker to assign dynamically the node within that particular Grid. Assigning nodes to different Grids and particularly to different Grid brokers will result in a static or semidynamic distributed processing of the e-WFs among the connected Grids and among the Grid sites of those Grids. A fully dynamic solution could be reached if a Grid meta-broker was available that can dynamically select Grids. Currently we are engaged in research to define and create such a Grid meta-broker [16]. Users of long running PS-WFs would like to dynamically access the partial results generated by the e-WFs and modify the input data sets according to the partial results. P-GRADE portal enables this dynamic change of input data sets. First of all, the user can check any remote output file as soon as the status of the job generating it became finished. If the user decides to modify the input data sets he can suspend the execution of the PS-WF. In this case all the already submitted e-WFs go into rescue state and the user can modify any input data. After modifying the input data set the user can make the suspended e-WFs resume their work utilizing the rescue mechanism of Condor DAGMan. Using the same mechanism even the graph structure could be modified by the user. This facility is not implemented yet and it requires further research to solve the consistency of all the e-WFs even if the PS-WF has been changed. 6. Related research There are two main streams of research directions that have strong relationship with our work. Research on scientific workflows in Grid environments is an exciting and richly investigated subject. A good survey in

this field has been written by Yu and Buyya [17] and a whole special issue was recently devoted to this subject in Journal of Grid Computing [18]. These papers clearly show that meanwhile significant efforts are made to increase the usability of scientific workflows in Grid environment from many aspects (workflow composition, scheduling, performance estimation, fault-tolerance, etc.) there was little work on how to define and manage parameter studies at the workflow level. The second main direction relevant to our investigation is the research on parameter studies. Some of the existing tools like Condor [1], UNICORE [19] or AppLeS (Application-Level scheduler) [2] can be used to launch pre-existing parameter studies within distributed resources. There are several important projects specifically aiming at the realization of parameter studies in Grid environments but most of them consider only individual jobs and not workflows [3], [8], [20], [21]. Usually, their main concern is the scheduling aspects of jobs in Grids although, [21] concentrates on data management. In many cases they define specialized middleware in order to optimize the scheduling [8] or introduce new scheduling concepts [3]. A good comparison of several of these tools and projects can be found in [22]. There are very few projects that deal with the integration of parameter study and workflow research. VisPortal [23] supports workflow-level parameter study applications but it is not a general purpose portal rather a specialized one to support only rendering applications. It was based on the Grid Portal Development Toolkit that is the predecessor of GridSphere used for building P-GRADE portal. ILab [4] and SEGL [7] are two main projects that provide a graphical workflow concept particularly tailored to support large parameter study applications. Notice that their goal is different from our goal. They want to support the parameter study applications by a workflow whose components are used to specify the next stage of parameter study processing activity. As a result their workflow is much more sophisticated than the P-GRADE workflow and its creation requires much more skill. Our goal was that existing DAG workflows should be enabled to be executed as parameter studies simultaneously exploiting as many Grids and Grid resources as possible. Conclusions The workflow concept of P-GRADE portal was very successful and popular among Grid users because its simplicity and expressiveness. Developing and monitoring Grid applications based on the workflow concept of the portal is extremely easy. Due to these advantages it was asked to set up for many different Grids (OGF GIN VO, EGRID, SwissGrid, Turkish Grid, BalticGrid, BioInfoGrid, CroGrid, Bulgarian Grid, Macedon Grid, etc.) meanwhile it runs as official portal of several Grids (SEE-GRID, HunGrid, VOCE) and serves other Grids as volunteer service (UK NGS, GILDA, etc.). The feedback from the users made it clear that they want a parameter study support at the workflow level but in a way that keeps the simplicity and expressiveness of the original workflow concept. Based on their request we have extended the portal with the workflow-level parameter study support. The new version of the portal has been prototyped and was publicly demonstrated at the EGEE conference in September 2006. The new version of the portal (version 2.5) that gives service quality full support for the workflow-level parameter study will be released in November 2006. We have described two algorithms for executing PS-WFs in a multi-Grid environment. The black box algorithm gives an optimal solution concerning the utilization of storage resources while the PS-labeling algorithm optimizes the utilization of processing resources. Further research is required to create a dynamic and adaptive integration of the two methods where the portal can dynamically decide which method to be used for a particular node of the PS-WF graph. The current execution method of PS-WFs enables the static distribution of nodes between different Grids and different Grid resources if brokers are not available in the connected Grids. If brokers are available Grid resources can be assigned dynamically but the Grid assignment is still static. To provide a fully dynamic allocation of Grids and Grid resources a meta-broker should be developed and connected to the portal and to the Grids. The development of such a broker is subject of further research in the framework of the EU CoreGrid project.

References [1] Thain, D., Tannenbaum, T., and Livny, M., Condor and the Grid, in Fran Berman, Anthony J.G. Hey, Geoffrey Fox, editors, Grid Computing: Making The Global Infrastructure a Reality, John Wiley, 2003. ISBN: 0-470-85319-0., pp. 299-336, 2003. [2] Casanova, H., Obertelli, G., Berman, F. and Wolski, R., The AppLeS Parameter Sweep Template: UserLevel Middleware for the Grid, Proceedings of the Super Computing (SC 2002) Conference, Dallas / USA, 2002. [3] Abramson, D., Giddy, J., and Kotler, L., High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?, IPDPS'2000, Mexico, IEEE CS Press, USA, 2000. [4] Yarrow, M., McCann, K., Biswas, R. and van der Wijngaart, R., An Advanced User Interface Approach for Complex Parameter Study Process Specification on the Information Power Grid, Proceedings of the 1st Workshop on Grid Computing (GRID 2002), Bangalore / India, December 2000. [5] Yarrow, M., McCann, K. M., Tejnil, E., and deVivo, A., Production-Level Distributed Parametric Study Capabilities for the Grid, Grid Computing - GRID 2001 Workshop Proceedings, Denver, CO, November 2001. [6] McCann, K. M., Yarrow, M., deVivo, A. and Mehrotra P., ScyFlow: An Environment for the Visual Specification and Execution of Scientific Workflows, GGF10 Workshop on Workflow in Grid Systems, Berlin, 2004. [7] N. Currle-Linde, F. Boes, P. Lindner, J. Pleiss and M.M. Resch, A Management System for Complex Parameter Studies and Experiments in Grid Computing, in: Proc. of the 16th IASTED Intl. Conf. on PDCS (ed.: T. Gonzales), Acta Press, 2004. [8] Casanova, H. and Berman, F., Parameter Sweeps on the Grid with APST, in Fran Berman, Anthony J.G. Hey, Geoffrey Fox, editors, Grid Computing: Making The Global Infrastructure a Reality, John Wiley, 2003. ISBN: 0-470-85319-0., pp. 773-788, 2003. [9] P. Kacsuk and G. Sipos, Multi-Grid, Multi-User Workflows in the P-GRADE Grid Portal, Journal of Grid Computing, Vol. 3, No. 3-4, pp. 221-238, 2005. [10] P. Kacsuk, T. Kiss and G. Sipos, Solving the Grid Interoperability Problem by P-GRADE Portal at Workflow Level, Proc. of the Grid-Enabling Legacy Applications and Supporting End User Workshop, in conjunction with HPDC’06, Paris, pp. 3-7, 2005. [11] T. Delaitre, et al., GEMLCA: Running Legacy Code Applications as Grid Services, Journal of Grid Computing, Vol. 3, No. 1-2, pp. 75-90, 2005 [12] R. Lovas et al, Dynamic workflows in the service-oriented P-GRADE portal using Grid superscalar, Austrian Grid Symposium, Innsbruck, 2006. [13] J. Frey, Condor DAGMan: Handling Inter-Job Dependencies, http://www.cs.wisc.edu/condor/dagman/, 2002. [14] http://www.gridsphere.org/ [15] http://www.lpds.sztaki.hu/pgportal/v23/manual/users_manual/UsersManualReleaseV2.html [16] A. Kertesz and P. Kacsuk, Grid Meta-Broker Architecture: Towards an Interoperable Grid Resource Brokering Service, CoreGRID Workshop on Grid Middleware in Conjunction with EuroPar’06, Dresden, 2006. [17] J. Yu and R. Buyya, Taxanomy of Workflow Management Systems for Grid Computing, Journal of Grid Computing, Vol. 3, No. 3-4, pp. 171-200, 2005. [18] E. Deelman and I. Taylor (guest editors), Special Issue on Scientific Workflows in Grid Environments, Journal of Grid Computing, Vol. 3, No. 3-4, pp. 151-304, 2005. [19] Erwin, D. (Ed.), Joint Project Report for the BMBF Project UNICORE Plus Grant Number: 01 IR 001 A-D, Duration: January 2000 to December 2002; ISBN 3-00-011592-7. [20] Abramson, D, Lewis, A. and Peachy, T.: Nimrod/O: A Tool for Automatic Design Optimization, The 4th International Conference on Algorithms & Architectures for Parallel Processing (ICA3PP 2000), Hong Kong, 2000. [21] H.A. James and K.A. Hawick, Scientific Data Management in a Grid Environment, Journal of Grid Computing, Vol. 3, No. 1-2, pp. 39-51, 2005. [22] DeVivo, M. Yarrow, and K. McCann, A comparison of parameter study creation and job submission tools, Technical Report NAS-01-002, NASA Ames Research Center, Mo#ett Field, CA, 2001. [23] http://www-vis.lbl.gov/Publications/2004/LBNL-PUB-893Visportal.pdf#search=%22%22parameter%20study%22%20AND%20%22Grid%20portal%22%22

The Java CoG Kit Experiment Manager Gregor von Laszewski,1,2 , Phillip Zimny,3, 1, Tan Trieu4,1 , David Angulo5,1 1 2

Argonne National Laboratory, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, IL 60440

University of Chicago, Computation Institute, Research Institutes Building #402, 5640 South Ellis Ave., Chicago, IL 60637-1433 3 4

PHILS INSTITUITION

Santa Clara University, Department of Computer Engineering, 500 El Camino Real, Santa Clara, CA 95053 5

DAVES ADDRESS DePaul University

Abstract

laboration among the scientific communities. The construction of the Grid requires the establishment of standards for a secure and robust infrastructure. One such undertaking is the definition of the Open Grid Services Architecture (OGSA), which provides a specification for a standard service-oriented Grid framework [1]. The implementation of the services form the Grid middleware, and the Globus Toolkit is today’s de facto standard Grid middleware [2]. The toolkit provides an elementary set of facilities to handle security, communication, information, resource management, and data management services [1]. However, the set of services may not be compatible with the commodity technologies that Grid application developers use. The Commodity Grid project addresses the incompatibility by creating Commodity Grid (CoG) Kits that define mappings and interfaces between Grid services and particular commodity frameworks such as Java, Perl, and Python [1]. The Java CoG Kit provides more than just a mapping between Java and the Globus Toolkit. The Java CoG Kit bridges the Java commodity framework and Grid technology. This means it not only defines a set of convenient classes that provide the Java programmer with access to basic Grid services [3], but also integrates a number of sophisticated abstractions, one of which is a workflow system [4]. Hence, it provides a significant feature enhancement to existing Grid middleware [1]. A popular use of the Grid is motivated by the field of bioinformatics, where applications such as Grid-enabled Blast [?] are used to compare base or amino acid sequences registered in a database with sequences provided by the user [?]. Blast runs can generate numerous queries that require hours or even days to complete. Managing such studies requires that scientists maintain the status and outputs of the

In this paper, we introduced a framework for experiment management that simplifies the users’ interaction with grid environments by managing a large number of tasks to be conducted as part of the experiment by the individual scientist. Our framework is an extension to the Java CoG Kit. We have developed a client-server approach that allows us to utilize the grid task abstraction of the Java CoG Kit and expose it easily to the experimentalist. Similar to the defintion of standard output and standard error we have defined standard status that allows us to conduct application status notifications. We have tested our tool with a large number of long running experiments and show its usability in practical applications such as bioinformatics.

1

Introduction

Grid computing addresses the challenge of coordinating resource sharing and problem solving in dynamic, multi-institutional virtual organizations [?]. The analogy between the computational grid and the power grid highlights the emphasis on virtualization. When a user plugs an appliance into the power outlet, he/she expects the delivery of power without concern for the whereabouts of the power source. Just as the electric power grid allows pervasive access to electric power, computational grids provide pervasive access to compute-related resources and services [1]. The Grid’s focus on integrating heterogeneous, distributed resources for the purpose of high performance computing differentiates it from other technologies such as cluster computing and the Web. The Grid’s ability to virtualize a collection of disparate resources to solve problems promisses effortless col1

individual queries, distancing them from the experiment at hand and burdening them with the tedious task of checking for job status and output. In an effort to relieve the scientist from the drudgery of managing output data and provide the scientist with a tool to monitor the progress of his/her jobs, we introduce the concept of an experiment. An experiment can be defined as tasks that are executed on the Grid with their associated output stored in a user-defined location. In this paper we show that the Java CoG Kit is ideally suited to support such a high level service. Using the facilities provided by the Java CoG Kit, we create a user driven experiment management system to simplify the administration and execution of repetitive tasks that use similar parameters.

port. This includes the presentation of a use case for our framework. Next we describe the architecture that fulfills our requirements. We describe the implementation and present preliminary performance results. We conclude the paper with our thoughts on future work to be conducted.

2

Requirements

The experiment management system has several major requirements including automated experiment checkpointing, transparent output management, automated version control, metadata management, detailed status reporting, persistent experiment sessions, scalable experiment updating. Next we will The user driven experiment management tool comdiscuss each of the requirements in more detail. bines features of several tools to empower the novice Grid user. It includes features typically found in queuing systems, shells with history, and process Automated Checkpointing. A basic assumpmonitoring programs such as the well known UNIX ps tion that the experiment management system makes command. Naturally it is targeted to include specific about experiments is that they are non-interactive, enhancements for the Grid environment. To empha- long running jobs. With long running experiments, size on the similarities let us revisit a typical use case the expectation that the host requesting the remote of a user using a UNIX command shell. A user work- resource maintains an uninterrupted connection with ing in the UNIX command shell queries which jobs the remote resource is impractical. From this stems have been submitted by the history function. The the requirement that checkpointing, or saving the status of process, a running instance of a program, state, of an experiment must be a transparent process can be obtained by issuing the ps command. The so that users do not have to associate experiments output provides information such as the process ID, with checkpoint files. Instead, once a user submits current status, the cumulated CPU time, and exe- an experiment he/she must only associate the expercutable name. Our experiment management system iment with its name in order to track its status. To provides a similar interface, displaying the added ex- address the overhead of maintaining a persistent conperiments in a format that includes the experiment nection, the Java CoG Kit abstractions module proID, current status, cumulative time the experiment vides a checkpointing mechanism that enables users has been queued, and experiment name. However, in to reconnect to a submitted job at a later time. extension to the normal command and history management tool by shells we must integrate user acces- Transparent Output Management. To shield sible outputs and error files on a command by com- the user from details about the Grid, the standard mand basis. error (stderr) and standard output (stdout) are autoA preliminary version of the execution manager matically saved in a predetermined experiment path is already available for years as part of the Java location to prevent the impression that the stdout CoG Kit under the name Grid Command Manager and stderr have vanished because they reside on the (GCM). However, we enhanced its functionality sig- remote execution host or because the experiment has nificantly.The enhancements include experiment sta- been duplicated (see also Version Control). Such tus checkpointing, management support for a large functionality provides the illusion of localized comnumber of experiment submissions, and the inte- puting while using the Grid. gratin of fault tolerant queues for managing experiment submissions. Version Control. Storage of output files leads to The rest of the paper is structured as follows. First, the next requirement of output version control. When we revisit our requirements that lead us to a redesign an experiment is submitted more than once, the outof the Grid Command Manager for experiment sup- put from its previous runs needs to be stored and 2

accessible for comparison to its future runs. An automated version control system sequentially names versions of output. This is to take the responsibilities of re-naming, moving, and organizing different versions of output away from the scientist.

Scalable Experiment Status Updating. With persistent sessions, the number of experiments within a session can grow quite large. The task of updating the status of such a large number of experiments can consume a disproportionate amount of computing resources on the client machine. The experiment management system thus needs to simulate thread exeMetadata Management. A scientist often has cution when updating the status of each experiment, additional information about an experiment that rather than creating a thread dedicated to status upneeds to be managed. Such information includes the dating for each experiment. authors of the experiment, the date the experiment, and other information pertinent for organizing and documenting of an experiment. Hence, an additional 3 Use Case requirement is to provide a system to automatically maintain the metadata of each experiment. This sys- [focus on biology case; derive requirements] In this tem must allow for easy entry of an experiment’s section we describe a scenario for the use of the cogmetadata as well as allow for changes to be made. experiment tool. It will allow the scientist to reference more than just The Basic Local Alignment Search Tool (BLAST) the output to uniquely identify each experiment. is one of the most popular tools for searching nucleotide and protein databases. It tests a nucleotide sequence against a database of known sequences and Application Status Reporting. [ why we have returns similarities. BLAST offers many different done this] Besides retrieving stdout and stderr, we types of queries to the data base including ones such believe that users will benefit from application status as nucleotide to nucleotide, protein to nucleotide, proreporting. Similar to stdout, we introduce a stantein to protein, nucleotide to protein, as well as many dard status (stdstatus) that can be used to report more. Biologists use this tool to help them discover more detailed experiment states as well as applicathe identity of the sequence they are studying or to tion specific information.1 When checking the staidentify the function of the sequence they are studytus of an experiment that reached the failed state the ing by comparing it to similar sequences BLAST user may wonder what triggered or caused the failure. finds. The standard status provides this service by trapping A biologist may find themselves in the scenario signals that interrupted the job. The user can then where they wish to research a specific genetic sequery the standard status to review the events that quence by making 500 slight modifications to the seoccurred before the failure. quence, run the it through BLAST and see if the The use of the standard status goes beyond error modification produces any similar characteristics of reporting; it provides a simple technique for runtime other sequences. The biologist would have start by application status notification. For experiments that running the original sequence and then run each of take days to complete, knowing that the experiment the 500 modifications. At the beginning of each subis running is often inadequate. The standard sta- mission of the BLAST run, the biologist would have tus provides a mechanism for application developers to name the task himself and have to then keep track to expose a more detailed record of the application’s of the 500 different names. Upon the completion of progress during execution. the BLAST run the biologist would then have to move his output files into separate directories which they Persistent Experiment Sessions. The ability to would have to name and then remember. Once the biologist has completed his 500 BLAST load information about previous experiments when runs, he now has 500 different outputs to manage. restarting the experiment manager is an important The biologist most likely does not wish to devise a maintenance tool. In case the experiment manager method of organizing all of the different output they abruptly shuts down or if the user has multiple inhave created. Even once organized, there is very little stances of the experiment manager running, persisaddition data associated with the outputs to allow tence enables the user to maintain sessions. the biologist to search through the output files. This 1 this is an implementation detail research method is ine⌅cient and timely and not the 3

optimum way a biologist would like to conduct their research. The cog-experiment tool offers a way to make this process not only much simpler but also much more e⌅cient. This tool allows the biologist to set up the submission process to repeat itself however times is need, in this case 500, while making a slight modification to the submission parameters each time. For this example, the modification to the parameters would be the file name each modified sequence was stored in. The submission process no longer requires the biologist’s presence. If these submissions took long periods of time to complete, such as several days, the cogexperiment tool would also checkpoint progress on the completion of a submitted task so the researcher would not have to start the task over from the beginning. In addition to offer a better submission procedure to the biologist, the cog-experiment tool simplifies the file management for the biologist. The biologist may pick one name and the cog-experiment tool will automatically assignment a sequential version number to each submission. Starting with one, the biologist now has an easy to understand version schema. This auto-naming feature takes away any worries about over writing data or forgetting the names used for submission. Once each submission has been automatically named, a folder of the same name is created to store that individual submission’s files in the user’s experiment path location. Now the biologist has all of their submission files properly named and has the output file neatly stored in individual folders. The biologist can use the option to enter metadata about the submissions on an individual level. Information about the specific submission such as author, time, or other notes can be saved to persistent storage. When the biologist does this, it allows them to search through all the outputs. For example, if the biologist wishes to view all the submissions from a certain day, they can simply search for that date and all of the submissions from that date are displayed on the screen.

4

provides high level abstractions that include Grid tasks, transfers, jobs, and queues that make developing Grid programs easier [4]. The experiment management system consists of two primary components, an experiment manager and a command line component. These components communicate via a socket, with the experiment manager running as a background job that services requests from the command line component to add, remove, submit, list, and retrieve the status of experiments. Figure 1 depicts the architecture of the experiment management system. The heart of the system is the experiment manager component, which maintains experiment status with a set of four queues: pending, submitted, completed, and failed. An experiment’s transition through the queueing system is illustrated through the state diagram in Figure 2. The user has control over two state transitions: adding an experiment to the pending queue, and performing a local submit to move the experiment to the submitted queue. The rest of the state transitions are handled by a background thread, which periodically updates the status of the queued experiments.

cog-experiment command line interface

Queue

R/W

Checkpointing Experiment Manager

R/W

Pending Queue Submitted Queue Completed Queue

W

Experiments Repository

Failed Queue Experiment Status the Grid infrastructure

Figure 1: The architecture of the Java CoG Kit experiment management framework. The experiment manager uses persistent storage to provide automated experiment checkpointing, transparent output management, and persistent experiment sessions. The automated experiment checkpointing and transparent output management functions rely on an experiment repository to store the checkpoint files for each submitted experiment and to save the stdout, stderr, and stdstatus resulting from

Architecture

The architecture of the experiment management system integrates with the Java CoG Kit’s layered approach. The experiment management system is a module that reuses the abstractions layer while exposing a command line tool. The abstractions layer 4

of experiment data manager to help organize an experiment’s metadata. [clarity on client/server] Experiment data manager is designed to hold all of the metadata of an experiment in one object. This class stores the separate pieces of metadata in individual strings. The only methods the experiment data manager class contains are simple get and set methods to set and retrieve information within the class. Experiment metadata implementation class is implemented using a variety of technologies. It uses JPanel from the swing package to construct the graphical user interface that is presented to user to enter the metadata to their experiment. This interface retrieves the metadata and sends it to be saved to persistent storage. The metadata is organized by being placed into an instance of the experiment metadata manager class. This instance of the experiment metadata class is then written to file in xml by using the XStream package [?]. The location of the metadata storage is the same as the experiments location for storing the stderr and stdout. This location is in a directory that shares the same name as the experiment it is associated with. The name of the experiment is ultimately determined not by the user but by the auto-naming method inside of the experiment metadata implementation class. This method automatically increments an integer that is attached to the end of the name of the experiment the user enters. To keep track of all of the experiments that have been created, experiment metadata implementation uses a vector that contains the string name of all of the experiments that have been added. This vector is the stored to persistent storage in xml format using XStream in the file entitled versions.xml. This file is referenced when the auto-naming method is determining the correct number to attach to the end of the name entered by the user. Experiment output manager is designed to handle the user’s query based commands. It uses XStream load the vector stored in versions.xml as well as other saved instances of the experiment data manager class. Once the necessary information has been loaded experiment output manager conducts the specified query through the data and returns the results.

Begin

Experiment State Diagram

Add Pending

Local Submission Failure

Grid Submission

Local Submit Submitted

Failure Grid Submit Runtime Failure

Failed

Running

Completion with Failure Completed

End

Figure 2: State diagram of an experiment as it transitions through the four queues. an experiment. To provide persistent experiment sessions function, the experiment manager periodically checkpoints the status of the four queues to persistent storage, where the status of the four queues can be reloaded when the system is restarted. The cog-experiment command line interface component provides access to the experiment manager functions to add, remove, submit, and retrieve the status of experiments. The command line interface also provides other functions such as metadata maintenance and version control, and thus requires access read and write access to the experiments repository. [figures changes: class relationships - fill out page redraw arch w/ lines

5

Implementation

The implementation of the experiment management system is split into the client and server components.

5.1

client

make sure that client and server clearly separated in different subsections Experiment manager client is implemented using an argument parser to retrieve the user’s desired function. Once the user has entered their command experiment manager client then parses the command into its separate components to determine which method to call. Experiment manager client’s methods use instances of experiment metadata implementation and experiment output manager. These two classes both use instances

5.2

Server

The server consists of four threads of execution; the main thread of execution listens for and responds to requests from the client, one for intermittent checkpointing of the queues, and another two threads to 5

update the status of submitted experiments. The main thread instantiates the experiment manager class that is responsible for providing all the necessary methods that exposes the interface for the client to communicate with the server. Once the checkpointed queues have been loaded from the experiment path, two new threads are spawned, and they are responsible for updating the status of experiments in the submitted queue. The number of threads and their polling intervals can be adjusted for performance fine-tuning. The choice of using a couple threads to monitor experiment status allows the experiment management system provide reasonable response time while restricting resource consumption. These threads also automate the retrieval of any output associated with the experiment. Because they can detect an experiment’s change in state, these threads update the standard output, error, and status on a minimal basis and conserving computational consumption. The final thread initiated during the experiment manager startup process saves the states of the four status queues at a configurable interval. The functionality addresses the possibility of an abrupt interruption preventing the experiment manager from gracefully halting. Another thread is initiated when the experiment manager server is started. It uses a configuration polling duration to checkpoint the queues, so that an abrupt interruption will not cripple the system, and be able to launch the experiment management system with minimal loss. The server uses four levels of class containment, and Figure 3 illustrates the containment relationship. At the root of the containment relationship is the ExperimentManager class that provides the serverside functions to respond to the commands that the client issues. The commands supported by the ExperimentManager class include add, submit, list, status, and stop. The ExperimentManager class implements the functions based on the four queues that it maintains: pending, submitted, completed, and failed. The queues consist of objects that hold an experiment data structure along with the experiment’s id and a dependencies list. The Experiment class is the core data structure in mediating communication between the experiment management system and the grid. The class exposes an interface to simplify the experiment submission process that incorporates enhanced status reporting through the standard status. The standard status is simply a file with entries describing the current sta-

!ontainment *e+ations-ips

ExperimentMana:er2er>er

ExperimentMana:er Experiment2tatus4ueue Experiment4ueue;b=ect 5d

Experiment

7ependencies

Figure 3: Containment relationship among the primary classes used to implement the experiment manager server tus of an experiment that is written to the working directory where the executable program is invoked. The standardized format of each entry, as shown in Figure 4, permits customized status reporting. However, those applications used must be rebuilt to append to the standard status additional information about its execution state. The standard status, as currently implemented, reports the latter three states of the experiment state diagram: running, completed, and failed. It also supports more detailed error detection by reporting trapped signals that otherwise would have vanished on the remote host. #CoG: :