Development of NPACI Grid Application Portals and Portal Web Services

8 downloads 69706 Views 2MB Size Report
of the last goal, though a web services based architecture in which clients host Grid application portals on local systems and use the. NPACI portal services is in ...
Development of NPACI Grid Application Portals and Portal Web Services M. Thomas, J. Boisseau Texas Advanced Computing Center, The University of Texas at Austin, {mthomas, boisseau}@tacc.utexas.edu

M. Dahan, C. Mills, S. Mock , K. Mueller The San Diego Supercomputer Center, The University of California at San Diego, {mdahan, cmills, {mdahan, cmills, mock, kurt}@sdsc.edu

Abstract Grid portals and services are emerging as convenient mechanisms for providing the scientific community with familiar and simplified interfaces to the Grid. Our experiences in implementing computational grid portals, and the services needed to support them, has led to the creation of GridPort: a unique, integrated, layered software system for building portasl and hosting portal services that access Grid services. The usefulness of this system has been successfully demonstrated with the implementation of several application portals. This system has several unique features: the software is portable and runs on most webservers; written in Perl/CGI, it is easy to support and modify; a single API provides access to a host of grid services; it is flexible and adaptable; it supports single login between multiple portals; and portals built with it may run across multiple sites and organizations. In this paper we summarize our experiences in building this system, including philosophy and design choices and we describe the software we are building that support portal development, portal services. Finally, we discuss our experiences in developing the GridPort Client Toolkit in support of remote Web client portals and Grid Web services.

1.

Introduction

Grid portals are emerging as a highly convenient mechanism for providing the scientific community with a familiar and simplified interface to the Grid and Grid services [1]. The Grid is a term that is applied to the infrastructure being constructed to interconnect highly distributed compute, archival, instrumentation, and other resources into a large, parallel, computational resource. The number of Grid portal projects is rapidly growing, and each project has its own

Page 1

solution to the challenges of connecting disparate resources, although many of these systems are converging towards some common approaches [15]. The complexities that Grid portal developers encounter in developing a simple user interface to the Grid include dealing with the limitations of the client browser (Netscape, IE, applets, JavaScript) and HTTP protocol, choosing the right webserver software (Apache, Netscape), identifying programming languages and mechanisms to use on the webserver to connect to Grid services (Perl/CGI, Java Server Pages, PHP), and working with multiple Grid technologies. While each technology is useful, for Grid programming, the lack of interoperable standards and common system components currently prevents portals from sharing information, data, and resources and arriving at common, reusable solutions. We have use our experiences in implementing computational science Grid portals for the NPACI (and PACI) program [24], such as the HotPage [25, 31] and other application portals [12, 21, 30, 32], to create a unique, layered system for Grid portal development that facilitates the building of multiple portals using common code: the Perl/GCI-based Grid Portal Toolkit (GridPort) [32, 28]. The toolkit minimizes the work required to build the infrastructure to support and provide services used by portals and applications on the Grid. The significance of this research is that we have demonstrated that GridPort software can be used in production environments, extending access to resources beyond NPACI (Alliance, PSC, and NASA/IPG), and is easily ported to other centers [3, 23]. We have also demonstrated that the GridPort software can be extended to support the Web Services architecture [6] that is being developed for commercial purposes [36] and for Grid computing [10, 11, 34]. In addition, we are developing the GridPort Client Toolkit, which

supports remote client access to the NPACI portal services via simple HTTPS/CGI connections. The remainder of this paper presents a summary of our experiences gained in building Grid portals and developing software for the Grid, and it is organized as follows: In Section 2, we present a discussion of the project background as well as the philosophy followed, design criteria met, and choices made in creating the GridPort system currently being used for production portals. Section 3 contains a brief description of GridPort, the technologies used, and the limitations of this system. In Section 4, we demonstrate the usefulness of this software system with an overview of GridPort-based production portals. In Section 5 we present a discussion of the GridPort Client Toolkit and web services.

2.

Background and Motivation

First developed in 1997, The NPACI HotPage was intended to provide users with a web-based single entry point to resources maintained by the NPACI partnership (comprised of UC San Diego, Caltech, University of Texas, University of Michigan, UC Berkeley, University of Virginia, and many other partners). As an information service, the website initially provided users with centralized access to documentation about and status of the resources. In 1999, the Globus project [9] provided the key enabling technologies that were needed in order for users to have real-time, secure access to all HPC resources. The HotPage was expanded to take advantage of many interactive operations enabled by Globus. GridPort was then converted to support multiple application portals, and this version is now available for download [18]. The GridPort software system continues to evolve as a result of work being done to develop the NPACI HotPage user portal and other application portals. The system contains elements that support portal development and interactivity by integrating different Grid and Web services, and it employs a variety of technologies in order to support the needs of the large and various research projects within NPACI and accommodate their diverse requirements. The system is flexible and portable to other centers, sites, and resources and can be used in a modular manner.

2.1.

Design Requirements

The driving philosophy behind the design of the GridPort portal system is the conviction that many potential Grid users and developers will benefit

Page 2

from portals and portal technologies that provide universal, easy access to resources, and require minimal work by Grid users and Grid application developers. There is clearly a need for large, sophisticated and advanced projects such as Gateway [1], Cactus [2], Unicore [27], and GriPhyN [16], but many of these projects require large budgets and staff to support their development and use. We believe that for a significant number of individual users and small projects, GridPort is a more appropriate system to enable Grid participation. From the initial design and creation of the informational HotPage, to the construction of the GridPort Toolkit and our current application portals, this has remained a focus of the project, and has driven many of the design choices made for the project. For example, most computational scientists are not computer scientists, so we have focused on keeping the portal environment very simple, easy to use, and easily accessible. GridPortbased portals require no software downloads or configuration changes on the client side, and run on common web browsers. Design choices made for this project imposed certain limitations and require tradeoffs, and several were made in order to solve a particular aspect of the Grid portal challenge: that of providing a generalized infrastructure that is accessible to and useable by the computational science community. If every scientist who wished to build a portal had to install the full set of portal software and services needed to connect to the Grid, there would be a tremendous duplication of human effort as well as unnecessary complexity in the resulting network of connected systems. A simpler and more efficient solution is required. The GridPort approach attempts to solve these challenges by meeting several key design goals and features: will be web-based and must run anywhere, anytime to provide universal access; provide a scalable and flexible infrastructure to facilitate adding/removing Grid resources, services, jobs, users; maintain tight security by supporting GSI and HTTPS/SSL encryption at all layers; create single login across multiple portals and resources; use common Grid technologies and standards; support and contribute to GFF standards, including GGF Virtual Organization mode [14]; provide technology transfer: create a download version of the software for other developers; provide integrated portal services environment for use by other portals and applications; and allow client applications and portal services run on separate machines; As a result of these design goals, and lessons learned from building several production Grid

Figure 1. GridPort portal implementation architecture diagram showing multiple portals installed on the same webserver machine. In this design, each portal has its own filespace, and shares the same instance of the GridPort modules and common portal account and authorization information. All access to the Grid is done through functions provided by GridPort and each request is authenticated against the account data. portals, we have arrived at a Grid portal system that meets most of the goals mentioned above, and presents a scalable, generalized solution for Grid portal development. Overall, the project has met all of the goals set above with the exception of the last goal, though a web services based architecture in which clients host Grid application portals on local systems and use the NPACI portal services is in development and has been successfully prototyped. This is discussed in the Section 5 of this paper.

3.

GridPort Architecture

The GridPort Toolkit has been in use as the backend engine for the PACI and NPACI HotPages, and other application portals since 1999, and has undergone continuous change as the needs of multiple application portals have evolved. Recently, the GridPort software has been updated and generalized to increase security, support a broader range of services, and make porting it to other sites easier and faster. In this section we present a brief overview of GridPort and it’s features and refer the reader to other publications containing more details [28, 32, 33]. GridPort supports multiple application portals access the same instance of the GridPort library (see Figure 1). As a result, GridPort supports single login across these portals: this is accomplished by requiring, at present, that all portals be hosted on the same machine and also

Page 3

be hosted on the same domain (for example, npaci.edu). Portals hosted by servers in the npaci.edu domain can access the same security data (e.g. cookies), which contain encrypted information about the session data. The current GridPort portal system has a ‘layered’ design that is depicted in Figure 2. In this schematic, we represent different parts of the portal system as layers, which represent logical parts of the portal where data and service requests flow back and forth, handling some specific aspect or function of the GridPort portal system. The layers of the system are: clients, which are typically web browsers, PDA’s, or applications; application portals – most NPACI application portals exist on the same machine and are served to clients by separate virtual webservers, while client-services are intended to be used by external applications and portals; portal services -- in addition to mediating between client requests and Grid services, the GridPort software also performs services for the portals and users; Grid services are the standard middle and tier of the Grid where Grid services such as Globus, Legion, SRB, NWS, Apples and meta-schedulers are available; a variety of resources make up the backend tier of the Grid where compute and archival resources are located. The current version of the GridPort toolkit employs the Globus Toolkit as the mechanism for connecting to remote computational resources. Any system running Globus can be added to the system by incorporating the data

Figure 2. The diagram shows the layers used in the NPACI portal system. Each later represents a logical part of the portal where data and service requests flow back and forth, and which handles some specific aspect or function of the GridPort portal system. about the system into a flat text configuration file. Currently, the NPACI Portals connect to the following systems: IBM (Blue Horizon, SP); Compaq (TCS1); CRAY (T3E, T90); Sun (E10K); SGI (O2K); Hewlett Packard (V2500); and various workstations and clusters running Globus. These resources are located at several sites, including SDSC, NCSA, Pittsburgh Supercomputing Center, NASA/IPG, UT Austin, Univ. of Kentucky, and others. In addition, file archival systems running software compatible with the PKI/GCI certificate system are supported, and we have recently added the ability to access any file system running an SRB service. Science portals based on HotPage and GridPort technologies have been implemented at the NASA Information Power Grid, the Department of Defense Naval Oceanographic Office Major Shared Resource Center, the National Center for Microscopy and Imaging Research (NCMIR), the University of Southern California (USC), the Daresbury Labs, UK, and the San Diego Supercomputer Center. The GridPort research effort has also resulted in collaboration among portal development groups at NPACI, NCSA, the Globus project, and NASA IPG to develop the PACI HotPage. The goal of the collaboration is to develop a common grid infrastructure so that user portals can access resources across all the collaboration sites. Many of the sites that started portal development with HotPage code have gone on to build their own portal systems, which was the goal of the technology transfer aspect of this project.

Page 4

3.1.

Technologies Employed

Wherever possible, the GridPort Toolkit employs simple technologies. On the client side, we do not require the use of client-side XML or applets, since some of our users still use older browsers and platforms that do not support these technologies. On the server side, we chose to implement the portal software in Perl/CGI since this software is easy to install under most webservers and is supported on all HPC systems, making it easy for portal developers to install and support. Several key Grid technologies are employed including: Globus/GRAM gatekeeper for interactive jobs [9]; Globus Grid Security Infrastructure (GSI) and Myproxy for security and authentication [22]; SDSC Storage Resource Broker (SRB) for dis tributed file collection and management [5]; and the Globus Grid Information Systems for information services [9]. GridPort is implemented in Perl 5/CGI, and has been installed and tested under both Netscape and Apache webservers, and runs on any Unix/Linux OS/webserver combination that supports SSL encryption and the HTTPS protocol. A prototype Perl CoG Kit is under development will be available by Spring, 2002 [28]. The Commodity Grid project also provides CoG Kits in Java, Python, and CORBA.

3.2.

Services Provided

Portals using the GridPort system provide two categories of services to the application portals running on the system: informational and interactive. Interactive services require user authentication, while informational services are open to any users without logging in to the portal. GridPort functions are used primarily by interactive services at this time. Interactive services are the secure transactions that provide users with direct access to HPC compute resources and allow the webserver to perform tasks for the user on those resources. Once a user is logged into an NPACI portal, the user has access to any NPACI-hosted portal that is part of the single login environment. Currently, GridPort v2.0 supports the following interactive functions: § Accounts: portal manages user’s account and tracks sessions, preferences, and portal filespace; § Authentication: Users log on via authentication against a GSI certificate data stored in the SDSC repository, or by using the Myproxy server; § Jobs: remote tasks are executed via the Globus/GRAM gatekeeper, including compiling and running programs, performing job and batchscript submission and deletion, and viewing of job status and history. § Commands: execution of simple Unixtype commands such as mkdir, ls, rmdir, cd, pwd; § Files: file and directory access to Grid resources file space using FTP or SRB commands, and common file management operations on remote files, such as tar/untar, gzip/gunzip.

3.3.

Security Issues

Session files and sensitive data including user proxies are stored in a restricted access repository. The repository directory structure is located outside of webserver filespace, and has user and group permissions set such that no user except the webserver daemon may access these files and directories. User login sessions are tracked via a browser cookie that is assigned a random value by the webserver when the user successfully authenticates to the portal. The random value in the cookie corresponds to a session file, which ties the cookie in the user’s browser to a specific user on the portal. The session file also contains a timestamp which GridPort uses to expire user login sessions that have been inactive for a set period of time. Portal users must create a portal account. The creation process requires the user to to supply the portal with a digital certificate from a known Certificate Authority (CA). Once the user has presented the portal with this credential, the user will be allowed to use the portal with the digital identity contained within the certificate presented to the portal. In order to access a computational resource through the portal, the user must already have privileges on that computational resource. When a portal uses GridPort to make a request on behalf of the user, GridPort presents the user’s credentials to the computational resource, which decides based on the local security model whether the request will be honored or denied. The portal acts as a proxy, executing requests on behalf of the user on resources that the user is authorized to access based on the credentials they presented when they created their portal account. Portal users have the same level of access to a particular resource through the portal as they would if they logged into the resource directly.

3.4.

Security between the client web browser and the web server is handled by SSL using a 56- or 128-bit key. For communications between the web server and the grid, the SSL RSA X509 certificate technology that is a feature of the GSI Toolkit authentication system (gssapi_ssleay) is employed. User authentication is accomplished by providing the portal with a valid proxy file. The proxy may be generated from a key/certificate pair or the portal may retrieve the proxy on behalf of the user from a Myproxy server. The password used for proxy retrieval or proxy generation by the portal is never stored and travels the internet via a secure https connection.

Limitations

The GridPort Toolkit has proven to be very useful and modular software. The code has been ported to several systems and is used for multiple production applications. However, due to the design constraints and chosen technologies, there are limitations to the system. First, the portal software is based on Perl/CGI, which is known to have lower performance on webservers than technologies such as Java servlets due to the fact that each time a new request arrives, the webserver must instantiate a new Perl process. There are software systems available that incorporate Perl processes into the webserver process, but we have not used them. However, in

Page 5

the day-to-day operation of the HotPage and other portals, we have not really noticed any unacceptable latencies associated with using Perl. There are latency issues associated with the portal systems. This becomes a perception problem, since portal users tend to have expectations that events on the portal will happen instantaneously, which is not always the case. The latencies are due to factors such as file upload/download/transfer through the webserver itself, network latencies (out of the control of the portal software), and known latencies associated with using the Globus GRAM gatekeeper. Other latencies associated with how the webpages are built. As discussed above, due to the typical profile of users within the PACI community, we had to limit client side features to the simplest set, and so the pages are constructed of simple HTML and Javascript, and all the portals are frames-based (frames are used to display results and request status). The result is that the portal itself is often much simpler in layout and design than one that would be built as an application running locally using advanced technologies such as Java. We approach this from a ‘level of service’ view: as a portal user community becomes more sophisticated, we add more advanced technologies. For example, in the GAMESS portals, users now have the option to install a local version of OpenGL visualization software, and download the visualization program QMView as a plugin and interactively view results. Creating Grid Portal accounts is cumbersome. The current GridPort toolkit supports users with GSI certificates, and we can only make certificates for users with NPACI accounts. This is out of the control of the portal development community: there is a need for meta-computing accounts and allocations, but there currently exists no software for managing Grid accounts, and the allocations process is still focused on assigning compute time associated with single machines, rather than Grid systems.

4.

GridPort Application Portals

Currently, there are several production portals in use at NPACI that are based on the GridPort Toolkit, and we briefly describe 2 in this paper (or a more detailed description please refer to reference [32, 33] ). Figure 1 depicts the current approach used for implementing GridPort-based portals. In this design, the webserver and application portals hosted on that webserver all

have access to the same logical filespace. In addition, each portal has its own file space, where specialized scripts and data are located. In addition, GridPort manages authentication and account information for all portals hosted on the local system, thus all portals running off that system will share a single login environment. The application portal developer incorporates the GridPort libraries directly into the code, and makes subroutine calls to GridPort software to access the functionality that GridPort provides. A batch job submission tool from the HotPage, for example, contains only a few lines of code that access GridPort. An example of the code needed to submit a job is presented below for the HTML web page, the resultant Perl/CGI code invoked on the web server, and the subsequent Perl code accessed from the GridPort Toolkit. For the HotPage, and most portals, most of the coding work is devoted to parsing the CGI data that the user entered into a web page form. The general approach used for a specific application portal is to provide a frames-based template, which allows for rapid project prototyping: our experiences have shown that developing portals with this system is part of a useful startup cycle to help teach and train developers on how to use the grid. In this section we demonstrate the simplicity of building a portal with examples of code used to submit a job to the NPACI portal system via GridPort. The code below shows the client side (web browser) HTML code needed to submit a job from a web browser using HTML/FORM’s submission via an HTTPS request to a web server. In this example, we are submitting a simple mpi executable to a queue on the NPACI Blue Horizon. Line 1 defines the FORM/ACTION, which will invoke a Perl/CGI script, located at https://hotpage.paci.org/ tools/cgi-bin/job_submit.cgi. In line 2, we define the full path name of the executable (e.g. rmount/paci/sdsc/mthomas/mpi_pi). Lines 3-9 define FORM elements for arguments needed to submit a job to the Queue. Once the form SUBMIT button is pressed by the user, a request is sent to the web server, which invokes the Perl/CGI program: 1:

2: 3:

Page 6

Executable Name: Arguments:

4: 5:

Select Queue: low normal 6: high express 7: Number of Cpu’s: 8: Max Time (min): 9: 10: 12:

The key lines of server code needed to set up user environment and local process variables, as well as the code needed to parse the CGI data (standard Perl/GCI approach) are shown next. Lines 1-11 build environment variables, lines 1218 build the arguments needed to make the call to the subroutine, gridport_globus_job_submit. The returned array, @output, is processed and returned to the user as a web page (line 19): 1: 2: 3: 4: 5: 6: 7: 8:

9: 10: 11: 12: 13: 14: 15: 16: 17: 18:

19:

###GET THE SCRIPTS LOCATION AND THE GLOBAL VARS### $MY_LOCATION = "tools/cgi-bin"; $CURRENT_DIR = `pwd`; ($PORTAL_ROOT, $rest) = split(/$MY_LOCATION/, $CURRENT_DIR); $GLOBAL_VARS_CONFIG = $PORTAL_ROOT . "cgi-bin/global_vars.cgi"; require "$GLOBAL_VARS_CONFIG"; require "$PORTAL_HOME_DIR/cgibin/hotpage_authen.cgi"; require "$GRIDPORT_HOME_DIR/services/globus/cgibin/ gridport_globus_job.cgi"; require "$PORTAL_HOME_DIR/tools/cgibin/user_dirs.cgi"; require "$PORTAL_HOME_DIR/tools/cgibin/user_jobs.cgi"; my $exe = $query->param(exe); my $args = $query->param(args); my $queue = $query->param(queue); my $cpus = $query ->param(cpus); my $max_time = $query->param(max_time); $mach = $query->param(mach); $exe = $exe . " $args"; @output = gridport_globus_job_submit($mach,$cpus,60,$ex e,$max_time,$queue);

Page 7

The GridPort code used within the gridport_globus_job_submit subroutine for job submission to a queue on an arbitrary machine are shown next. This is part of the GridPort module that is executed by the webserver process. Lines 8-10 are used to build the RSL string for globus_run, based on a concatenation of configuration data (local file), process environment data, and user data. Note that we have excluded common configuration code contained at the top of the module. Both codes demonstrate the simplicity associated with Perl/CGI programming. Note that the example shown below is for the earlier, non-GoG version of GridPort which has been developed into a Perl GoG Package [28]: 1: 2: 3: 4: 5: 6: 7: 8:

9: 10: 11:

12: 13:

sub gridport_globus_job_submit { my @jobid = (); my $user = &get_username(); #get the input array my ($mach, $cpus, $timeout, $exe, $max_cpu_time, $queue) = @_; &globus_config($user); #verify data &mach_config($mach); #verify load configurations information my $glob_sub= "$globus_job_submit{$machines{$mach}{gv}} "; #build RSL $ glob_sub.="$machines{$mach}{name}{job} -np $cpus -queue $queue "; $ glob_sub.= "-maxtime $max_cpu_time $exe"; my @jobid = run_command_timeout($glob_sub, $timeout); # run job return @jobid; }

There are a variety of application portals based on GridPort, and in the next section we discuss two of the portals built using GridPort that demonstrate both the usefulness and simplicity of using GridPort for building application portals, and at the same time demonstrate the effectiveness of these portals (a more detailed discussion can be found in [33]. While the HotPage is a low-level portal, allowing authenticated users to log on and directly access accounts on HPC systems through a web browser, the GridPort Toolkit also supports developers interested in providing Web access to their scientific applications. The resulting applications approach the true promise of the grid: A scientist interacts with a standard web interface, while an application runs on potentially many high-performance computers at

Figure 3. Screenshot of the NPACI HotPage, during an interactive session in which an authenticated user is preparing to submit an MPI job to the Blue Horizon queue. remote locations in response to the scientist's request. GridPort is currently being used to create web interfaces to chemistry, drug dosage, and other applications in collaboration with computational scientists at SDSC. Evidence of it’s ease of use can be found by the fact that many portals are being developed around the world [35, 4], and many of these sites have also found the HotPage concept to be of use, and have either installed the HotPage software or have introduced modified versions of the software [3, 23].

4.1.

delete jobs, manipulate directories and files, and manage accounts. The layout of the website allows a user to view these resources from both the Grid or individual resource perspectives , in order to quickly determine system-wide information such as operational status, computational load, and available compute resources. This design provides simple, fast, and direct access to HPC systems and general information about them. Recent modifications to the interactive portion of the website include a new file navigation paradigm, new tools for editing remote files, job submission wizards, and multiple file selection capabilities. The informational services provide a useroriented interface to NPACI resources and services. They consist of on-line documentation, static informational pages, and links to events within NPACI, including basic user information such as documentation, training, and news. The informational services provide real-time information for each machine, including operational status and utilization of all resources; summaries of machine status, load, and batch queues; displays of currently executing and queued jobs; and a downloadable graphical map of running applications mapped to nodes. This data is published via the Grid Information

The NPACI HotPage

The HotPage user portal is designed to be a single point-of-access to all Grid resources that are represented by the portal. In the case of NPACI and the PACI portals, these resources span the PACI Grid and NASA Information Power Grid. It provides access to both informational and interactive services (see 3). Information services are comprised of web pages and links to existing documentation and dynamically updated status data. Interactive services are secure transactions that provide users with direct access to the HPC compute resources and allow users to access their HPC accounts, where they can submit, monitor and

Page 8

Figure 4. Basic architecture diagram of the Telescience portal, in which portal users will have access to remote instrumentation (microscope), automatic SRB collection access for storage of raw results and analyzed data; portal tools for creating and managing projects; and remote visualization applications.

Services (GIS), or via the HotPage. The GIS services allow us to share data with or to access from other portal or Grid systems such as the Alliance/NCSA or the Cactus Grids. New versions of the Globus GRIS services will provide authentication via GSI security infrastructure allowing the GIS to also be used to obtain and publish user specific data such as job status, user account information, etc. Interactive services are the secure transactions that provide users with direct access to the HPC compute resources and allow the webserver to perform tasks for the user on those resources. To access the interactive services, the user needs to login and authenticate which is based on GSI. Since GridPort supports a single login environment, once a user is logged into the HotPage, the user is logged in to any NPACI hosted portal. After authentication, the user is able to access accounts with the same privileges as if he had logged directly onto the system via standard login.

4.2.

basic architecture of the Telescience portal is shown in Figure 4: portal users have access to remote instrumentation (microscope) via an Applet, automatic SRB collection access for storage of raw results and analyzed data; portal tools for creating and managing projects; and remote visualization applications. The portal guides the user through the tomography process, from an applet based data acquisition program to analysis. A central component of the portal is a tool known as FastTomo, which enables microscopists to obtain near real-time feedback (in the range of five to fifteen minutes) from the tomographic reconstruction process, online and without calibration, to aid in the selection of specific areas for tomographic reconstruction. By using SRB, the portal provides an infrastructure that enables secure and transparent migration of data between data sources, processing sites, and long-term storage. Telescience users can perform computations and use SRB resources at NCMIR and SDSC, and on NASA’s Information Power Grid (IPG). The tomography process generates large quantities of data (several hundred megabytes per run, on average), and the portal moves data between these locations over highspeed network connections between compute and archival resources such as the Sun Ultra E10k and the HPSS storage system).

The NCMIR Telescience Portal

The Telescience portal is a web interface to tools and resources that are used in conducting biological studies involving electron tomography at NCMIR, the National Center for Microscopy and Imaging Research at UCSD [30]. This portal is an example of a system where legacy code was modified to include calls to the GridPort Toolkit, with minimal changes to the original code. The

Page 9

5.

NPACI Portal Services

The NPACI portals program has evolved from developing a single user portal (HotPage), to a unique systems approach that provides a set of services ranging from hosting multiple application portal web pages to providing access to all the resources that are available within large programs such as the PACI or NASA/IPG Grids [23, 24] via Web Services. Through the PACI HotPage (http://hotpage.paci.org) and the ‘Portal Services’ provided by the portal system, direct access to all PACI resources is available to multiple users and remote applications. The integration and coordination among the partner sites represents a significant amount of effort and one could then ask the question of how to save other users and developers time and effort by providing the portal as a system of services to other organizational applications. The layered architecture of GridPort (Figure 2) reflects this services approach and is a pre-cursor to the emerging commercial web services [2] and OGSA (Open Grid Services Architecture) architecture proposed in [4]. The specific example of the HotPage and other application portals discussed here, and the services as shown in Figure 2 effectively represent PACI portal and services as a collection of virtual organizations (VO). This concept is introduced by Foster, et. al. and has been proposed to become an architecture standard within the GGF [3], and is conceptualized in Figure 5. A long term goal of this project has been one of converting the services supported for the PACI organization, as a result of HotPage development, accessible to other portal users (both local and remotely hosted), Grid applications, Grid services, and Grid Web services. The emerging web services architecture facilitates achieving this goal at the application layer, and will be useful to portal and application developers. Additionally, we believe that there is still a need for remote clients to access the PACI Grid services with simple mechanisms, such as command line tools or a locally hosted web page using HTML. In this section, we present a short discussion of our experiments in providing portal services to remote Web clients followed by a discussion of our plans for Grid Web services.

5.1.

Figure 5. The virtual organization (VO) represented by the GridPort and HotPage portals systems is represented above, starting at the NSF layer. Each HotPage and the supporting services act as gateways to a collective set of Grid resources. In the case of the NPACI HotPage, resources at all NPACI partner sites are accessible to the users. system that would allow clients to build locally hosted web pages or applications that access the PACI/NPACI portal services remotely, or from resources and websites hosted on local machines or from other portal systems. The initial system was built with a very elementary protocol, using HTML/FRAMES, FORM elements, and HTTPS/CGI. The FORM elements were used as the client mechanism for sending data to the service portal. A portal service running at https://portals.npaci.edu was implemented with a set of wrapper Perl scripts around the standard GridPort libraries. A set of simple client web pages were built with HTML, running on a local webserver, browser, or laptop. This system works with a set of very basic and limited functions including file listing, batch job submission, binary execution, and authentication, all managed by the web service portal. The software has been bundled into the GridPort Client Toolkit, and consists of a set of client tools and services, protocols, and example web pages are available [18]. To test the software, we developed a set of simple frames that can be run on any local system, or laptop. In continuing with the job submit examples given above, we show an example of HTML code used on the client browser to submit a job to the NPACI portal services below:

Remote Client Services

We designed a simple experiment to determine the efficacy of the portal services oriented architecture within the GridPort portal

Page 10

1:

2: 3: 4: 5:

6: 7: 8:

9:

Blue Horizon HPC 10K/64 T3E/272 Script Path:

This uses basic HTML with a minimal set of required FORM elements as well as some HIDDEN variables. In line 3, the user selects the COMP_RESOURCE_KEY, which is used by the service portal to determine which resource to send the job to. In Line 9, we allow the user to select a return format (RET_FMT), so that text, HTML, or XML can be returned. The protocol is very simple, and follows a simple key=value pair approach. Details can be found on the project website [18]. The experiment above demonstrates that the basic concept of ‘wrapping’ existing GridPort functionality with simple CGI web services scripts demonstrates can be done in a straightforward manner. The next step in our development will be to define additional services to be provided (such as GIS, job submit, file transfer, batch script generator, SRB collection management, and others), and to evaluate the effect of the emerging web services and OGSA on this design. We feel that application scientists will be able to take advantage of this simple programming approach, which will allow them to participate on the Grid with needing to allocate the resources currently needed to build Grid portals.

5.2.

NPACI Web Services

The term “web services” describes a serviceoriented architecture in which distributed applications that comprise some specific function are wrapped in a common protocol based on XML, SOAP [29], and WSDL [36]. These services and are published as part of a service or an aggregate of services. A client goes to an information service and locates an application service that satisfies the request, obtains the

Page 11

needed XML schemas and protocols, and uses this information to submit a request to the published service. The ability to define common protocols, systems, and toolkits for Grid portals, applications, and Grid services to publish, reuse and share is clearly important. The GridPort team is working on developing Web service ‘wrappers’ for many of the services provided by the NPACI HotPage and the NPACI Grid Information Services (GIS), as well as many of the interactive services. Currently, we are developing web services based on the current commercial standards, but will be evaluating the impact of the proposed OGSA standards submitted to the GGF by the Globus and IBM teams. Since Globus is part of the common framework of the NPACI Grid, we will integrate these protocols into our designs. The impact of a web services architecture on the development of the Grid is demonstrated in Figure 6, which shows a conceptual diagram of how portals or applications hosted at various webservers or resources, can utilize Grid web services that are provided by multiple organizations. Note that in this design web services will also be able to function as clients of other web services. Since the developers need only program to a protocol, the implementation details such as programming language, data base , or even which Grid software is used becomes irrelevant as long as the protocols can be agreed upon. The diagram conceptualizes how multiple portals or applications, hosted across multiple webservers or resources, might utilize distributed grid web services. Steps: (1) Texas Portal (TP) authenticates user via MyProxy Web Service; (2) TP checks resource status; (3) TP check AKENTI authorization; (4) TP pulls files from collection; (5) TP submit request to JobSubmit web service (JSWS); (6) JSWS submits request to GRAM gatekeeper; (7) when done TP sends request to SRB-WS to migrate files into collection. The set of services will be based on those being constructed as part of this work is being done in collaboration with members of the Global Grid Forum (GGF) Grid Computing Environments Research Group (GCE), as part of the “Interoperable Web Services Testbed Project” [34]. The GCE has proposed the following minimal set of web services, which are needed to perform basic tasks: job management, scripts; accounts and allocations; data management (files, SRB, FTP); events; Grid messaging; scheduling; security (GSI, Kerberos), applications (factories); collaborations. It is

CAPTION FOR FIGURE 6: This diagram conceptualizes how multiple portals or applications (represented by the top layer), hosted across multiple webservers or resources, might utilize grid web services. Steps: (1) Texas Portal (TP) authenticates user via MyProxy Web Service; (2) TP checks resource status; (3) TP check AKENTI authorization; (4) TP pulls files from collection; (5) TP submit request to JobSubmit web service (JSWS); (6) JSWS submits request to GRAM gatekeeper; (7) when done TP sends request to SRB-WS to migrate files into collection. expected that many of these will have some initial implementations by GGF5 (Summer, Scotland). The purpose of the testbed not to develop standards but to set up a Grid environment where we can explore issues associated with interoperability among developers and organizations, examine security and policy mechanisms, test workflow concepts, etc. At this time, the planned web services for NPACI portals will include information services (load, node, status, queue usage, batch script generators, etc.) that are both secure and public. This is now possible due to the fact that the Globus MDS 2.1 supports authenticated queries. For informational services, we were able to produce a simple GIS/MDS query, running on an Apache web server, using Perl/GCE and SOAP::Lite. Our initial interest was to continue to use Perl, in order to maintain the development environment currently in use for GridPort. However, our attempts to develop secure web services based on GSI, which requires proxy delegation, failed because the Perl CRYPT modules do not currently support this feature. We believe that this is because the fundamental Perl module needed for SSL/HTTPS/PKI has not been updated since 1999. As a result, we have begun to use Python for development of our secure web services, which is supported by the work done by Jackson, et. al. at Lawrence

Page 12

Berkeley Labs who were able to integrate GSI proxy delegation into their libraries [19]. Similarly, the team at Indiana has also solved this problem in Java, so we will also develop secure services based on Java [7].

6.

Conclusions

In this paper we present an update on the current status of portal development within the NPACI and PACI organizations, and we review the status and activities of the NPACI GridPort Project, and steps taken towards the integration of the emerging web services architecture into our framework. Several production application portals have been developed based on the GridPort Toolkit, and those that run on the same domain (e.g. *.npaci.edu) share a single login and portal account environment. The ease of GridPort is evidenced by the international adoption of this software. We also discussed our idea that application scientists will be able to take advantage of a simple programming framework that we are developing, which will allow them to participate on the Grid with needing to allocate the resources currently needed to build Grid portals based on our layered portal services approach. The GridPort Client Toolkit allows portals to be hosted on local resources and using only HTML/FORMS/CGI to access the NPACI portal system. We discuss in this paper our thoughts on

how large organizational projects, such as the NPACI GridPort Portals Project, will be able to provide integrated portal, Grid, and Web services to the user community and act as gateways to resources for tasks such as GIS, job submit, file transfer, batch script generator, SRB collection management, and others. Our experiences in integrating the emerging web services architecture have convinced us that this protocol will help simplify the development of overall Grid services and simplify the applications and portals that will use them. Unfortunately, we have found that the Perl community has not kept up security requirements (in particular CRYPT does not support PKI proxy delegation) and this has effectively ruled out Perl as a web services development language for Grid Web services. Python and Java will be our choice for development languages, and for the next generation of the GridPort Toolkit.

7.

[9]

[10]

[11]

[12]

References [13]

[1]

[2]

[3] [4] [5]

[6]

[7]

[8]

Akarsu, E., Fox, G., Haupt, T., Kalinichenko, A., Kim, K., Sheethalnath, P., and Youn, C.. Using Gateway System to provide a desktop access to high performance computational resources. Proceedings of the Eight IEEE International Symposium on High Performance Distributed Computing, August, 1999. Allen, G., Benger, W., Goodale, T., Hege, H., Lnafermann, G., Merxky, A., Radke, T., Seidel, E.. The Cactus Code: A Problem Solving Environment for the Grid. Proceedings of the Eight IEEE International Symposium on High Performance Distributed Computing, August, 1999. Alliance User Portal. Last accessed on 1/1/02 at http://www.ncsa.edu ApGrid Testbed, last accessed on 1/1/02 at http://www.apgrid.org. Baru, C., Moore, R., Rajasekar, A., Wan, M.,"The SDSC Storage Resource Broker,² Proc. CASCON'98 Conference, Nov.30Dec.3, 1998, Toronto, Canada. Borck, J. R., Web Services: Next Generation e-biz. Issue 20, InfoWorld, May 14, 2001, pg 77. Extreme Computing Project, Grid Web Services Page, last accessed on 2/1/02 at http://www.extreme.indiana.edu/xgws. Foster, I., and Kesselman, C. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications. 11(2):115-129, 1998.

Page 13

[14] [15]

[16]

[17]

[18]

[19] [20]

[21]

Foster, I., and Kesselman, C., editors, The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufman Publishers, 1998. Foster, I., Kesselman, C. and Tuecke, S. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 15 (3). 200-222. 2001. www.globus.org/research/papers/anatomy.p df. Foster, I., Kesselman, C., Nick, J. and Tuecke, S. The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project, 2002, www.globus.org/research/papers/ogsa.pdf. GAMESS Portal: The Laboratory for Pharmacokinetic Modeling. Last accessed on March 25, 2001 at: https://gridport.npaci.edu/GAMESS. Ghatia, D., Burzevski, V., Camuseva, M., Fox, G., Furmanski, W., and Premchandran, G.. WebFlow – a visual programming paradigm for Web/Java based coarse grain distributed computing. Concurrency: Practice and Experience, 9(6):555-578, 1997. Global Grid Forum Website. Last accessed on 1/1/02, at http://www.gridforum.org. Grid Computing Environments Working Group, Global Grid Forum. http://www.computingportals.org. Grid Physics Network Project Description. Available: http://www.gryphyn.org/info/documents/pr oj-desc1.0.pdf. Grid Portal Development Toolkit (GPDK). National Laboratory for Applied Network Research. Last accessed on 6/6/01 at http://dast.nlanr.net/Projects/GridPortal. GridPort: The NPACI Grid Portal Toolkit. Last accessed on 1/1/02 at: http://gridport.npaci.edu/donwload. Jackson, K. R., py Koontfz, J. E, McCormack, R. P., and Devaney, J. E., WebSumit: a paradygm for platform independent computing. Proceedings of the Workshop on Seamless Computing, Reading, England, September, 1997. LAPK Portal: The Laboratory for Pharmacokinetic Modeling. Last accessed on March 25, 2001 at: https://gridport.npaci.edu/LAPK.

[22] MyProxy, National Laboratory for Applied Network Research. Last accessed on 1/1/02 at http://dast.nlanr.net/Projects/MyProxy. [23] NASA IPG Launch Pad Portal. Last accessed on 1/1/02, at http:/www.computingportals.org/ Cpdoc/LaunchPad.doc [24] National Partnership for Advanced Computational Infrastructure (NPACI). Project website last accessed on June 1, 2001 at http://www.npaci.edu. [25] NPACI HotPage User Portal. Last accessed on 1/1/02 at: https://hotpage.npaci.edu.. [26] R. Bramley, R., Chiu, K., Diwan, S., Gannon, D., Govindaraju, M., Mukhi, N., Temko, B., Yechuri, M.. A component Based Services System for Building Distributed Applications. Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing, August, 2000. [27] Romberg, M.. The UNICORE system: seamless access to distributed resources. Proceedings of the Eight IEEE International Symposium on High Performance Distributed Computing, August, 1999. [28] S. Mock, Dahan, M., von Lesweisky, G., Thomas, M., The Perl Commodity Grid Toolkit. Grid Computing environments: Special Issue of Concurrency and Computation: Practice and Experience, to be published Winter, 2002. [29] SOAP: Simple Object Access Protocol. Available at: http://www.w3.org/TR/SOAP. [30] Telescience Portal: The Laboratory for Pharmacokinetic Modeling. Last accessed on March 25, 2001 at: https://gridport.npaci.edu/Telescience. [31] Thomas, M. P., Dahan, M., Mueller, K., Mock, S., Mills, C., Regno, R.. Application Portals: Practice and Experience. Grid Computing environments: Special Issue of Concurrency and Computation: Practice and Experience, to be published Winter, 2002. [32] Thomas, M. P., Mock, S., Boisseau, J., Development of Web Toolkits for Computational Science Portals: The NPACI HotPage. Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing, August, 2000. [33] Thomas, M. P., Mock, S., Dahan, M., Mueller, K., Sutton, D., The GridPort Toolkit: a System for Building Grid Portals . Proceedings of the Tenth IEEE

Page 14

International Symposium on High Performance Distributed Computing, August, 2001. [34] Thomas, M. P., The GGF/GCE Interoperable Grid Portal Web Services Testbed, available at http://www.computingportals.org/testbed [35] Allan, R. J., Daresbury Laboratory CLRC e-Science Centre HPCGrid Services Portal. Last accessed on 1/1/02 at http://esc.dl.ac.uk/HPCPortal/. [36] WSDL: Web Services Description Language. Available at: http://www.w3.org/TR/wsdl.

Suggest Documents