J Grid Computing (2012) 10:709–724 DOI 10.1007/s10723-012-9234-3
The Charité Grid Portal: User-friendly and Secure Access to Grid-based Resources and Services Jie Wu · René Siewert · Andreas Hoheisel · Jürgen Falkner · Oliver Strauß · Dinko Berberovic · Dagmar Krefting
Received: 15 December 2011 / Accepted: 6 September 2012 / Published online: 12 October 2012 © Springer Science+Business Media B.V. 2012
Abstract The Charité Grid Portal combines portal components from different groups and projects to provide domain researchers a gateway to Gridbased biomedical applications. Trusted users can securely access and employ Grid resources and services. In this paper, five portal components are presented: (1) The credential management administrates the user-credentials and authenticates them to the Grid. (2) The brain imaging data analysis (FSL) submits workflows to the Grid as part of Medical Image processing. (3) The integrated web services of Generic Workflow Execution Service (GWES) manage workflows executed by users in the Grid. (4) The data management component provides secure and efficient data management in a Grid environment, and enables highspeed data transports between user and Grid. (5) The lung sound analysis application pro-
J. Wu (B) Charité – Universitätsmedizin Berlin, Berlin, Germany e-mail:
[email protected] R. Siewert · D. Krefting HTW Berlin, Berlin, Germany
vides twofold-pseudonymization before datatransferred to the Grid. The implementation as standardized portlets allows easy integration of specific components into different Grid portals such as the VO specific MediGRID and PneumoGrid portals. Keywords Grid computing · Grid portal · Security · Usability · HealthGrids
1 Introduction Grids have now been employed for many years in biomedical research, forming so-called HealthGrids [1]. In particular, scientific research requiring intensive parallel computing benefits significantly from Grid computing. Processing time of expensive applications can be reduced tremendously by using Grids. Many national or international Grid projects such as Vlemed [2], the GATE-Lab [3] and MediGRID [4]1 have already proved the successfulness upon the employment of Grid computing for biomedical applications. Moreover, Grids are envisioned to be used in diagnosis and patient care. Grids boast its efficiency not only with sharing computing power, but also
A. Hoheisel Inubit AG, Berlin, Germany J. Falkner · O. Strauß · D. Berberovic Fraunhofer Institute IAO, Stuttgart, Germany
1 As part of German Initiative D-Grid, currently under transition to NGI-DE.
710
with sharing data and applications across institutional boundaries. Research in Grids has been conducted for more than a decade now and the obvious advantages of Grid infrastructures have been well documented. Therefore, it is surprising that the usage of Grids hardly goes beyond the groups that are themselves active in Grid research. Thus, the main objective of academic Grids—to share resources among domain researchers without any knowledge of the underlying infrastructure—is not achieved yet. On the other hand, distributed computing and storage are nowadays easily and widely available to everyone through Cloud-services offered by several companies. So what are the relevant differences between a public Cloud and a scientific Grid and what prevents the latter to be widely adopted by domain researchers? Important differences like the heterogeneity of a Grid infrastructure and the complexity of scientific applications are today successfully addressed by usage of Grid workflow systems that execute and orchestrate Grid applications [5]. However, the variety and complexity of Grid applications also implies that a domain researcher requires sufficient guidance through the application. In particular the definition of input parameters, possible interactive checkpoints, intermediate visualization and access to the results are important issues that require application specific user interfaces. There also exist important differences in the security requirements for Grid infrastructures, these differences originate from both the resource provider’s and the user’s side. The resource providers want to assure that only authenticated users have access to resources and services that they are authorized for. The users demand their privacy to be guaranteed. To combine these high security requirements in an open Grid infrastructure that is designed to share resources is a challenging task. Common publicly available Cloud services guarantee only low security levels [6]. A typical barrier for Grid employment is due to the usage of Grid-specific transfer protocols and connection ports, that are blocked in common institution’s firewall settings. Many of the client tools that are offered to Grid users cannot be used by domain researchers behind firewalls [7].
J. Wu et al.
User privacy encompasses the processed data, but also information about the users who are using the Grid. Security is particularly crucial in biomedical research, where patient data is processed and data protection laws apply [1]. Only trusted user can access various services and resources in the secure Grid environment. Therefore, both of the D-Grid2 based systems, PneumoGRID3 and MediGRID focus on the fine-grained user management, using Virtual Organization (VO) and subVO (so called group) structure. All members in VO are assigned to a corresponding group in the local operating system resources. Registration process is managed by the VO Management Registration Service (VOMRS) [8, 9]. During the registration process, the applicant has to accept the usage policies of the respective VO, so a certain legal basis for the provision and use of Grid resources is given. The identification of users is done via Grid user certificates (X.509 certificatebased authentication, the central concept in Grid Security Infrastructure (GSI)) [10]. An even more severe problem arises from the user certificate handling. While all internet users are using public key infrastructures (PKI) [11] for secure https connections, the Grid infrastructure demands user certificates, which are still rarely used in daily life. Starting with biomedical Grids in 2005, the management of user authentication and authorization has continuously been one of the main obstacles for our domain researchers. The different passwords and tools to manage certificates, proxy certificates and credentials were so confusing that many potential users resigned to use the Grid services. The same applies to data management on the Grid. While web-storages like “Dropbox”4 are very easy to use, transferring data to Grid storages is often hindered by firewall restrictions or authentication problems. In summary, in many Grid infrastructures security was counteracting usability, so one of the main challenges for the acceptance of Grid services by domain researchers has been to provide a sufficient level of usability.
2 http://www.d-grid.de/ 3 http://www.pneumogrid.de/ 4 https://www.dropbox.com/home
The Charité Grid Portal
711
Table 1 Main issues related to security versus usability and our approaches to overcome usability issues Security mechanisms
Usability aspects
Approach
Institution’s firewall configuration, blocking all outgoing connections except http/https on standard ports (80/443)
Many ports for Grid usage are blocked making it impossible to use locally installed Grid clients. Clinic researchers or medical doctors have almost no chance to influence the institution’s firewall settings. Certificate and credential management is complicated and involves several steps and components. Low control giving some users the impression of loss of control. No links between the medical data and the patient.
Web based Access for Grid users. Connection to the Grid is exclusively realized by the portal server.
End-to-end usage of the Grid security infrastructure
Remote resources and services Anonymization of medical data
A promising approach are web-based Grid portals—or scientific gateways—that allow plain web-based access to the Grid and provide graphical user interfaces to generic Grid services like credential or data management and domainspecific applications. In this work, we present and discuss portalbased solutions developed within the German D-Grid initiative to enable and simplify the employment of Grid services by domain researchers. They are brought together in the Charité Grid Portal, a Liferay5 based scientific gateway for Gridbased biomedical image and biosignal processing. 1.1 Security Versus Usability As already mentioned briefly in the introduction, both security and usability must be implemented on a relatively high level for biomedical research. Existing Grid security mechanisms must be adopted for biomedical Grid applications, because of data security and data protection requirements. But handling the security mechanisms is a major concern for end-users of Grid applications. HealthGrids must find a balance between both needs, as stronger security mechanisms are usually at the expense of usability. From the application side, the application should behave in a safe manner, defaulting to behavior that protects the user 5 http://www.liferay.com/
Intuitive credential management, deriving most of the configuration from the user’s credential stored in the browser. Provision of several levels of monitoring details. Pseudonymization along with a data protection concept
from harm. From the user side, the application should be used in a manner for the inexperienced users [12]. We identified four main topics related to security and usability during the adoption of Grid infrastructures in the biomedical domain. These topics are listed in Table 1.
1.1.1 Restricted Firewalls Biomedical research in general and medical imaging and biosignal processing in particular relies on human data genetic samples, physiological measurements and images of body parts and organs. Most research is carried out in medical institutions such as university hospitals. As it is virtually impractical for the institutions to separate strictly the clinical network from research, domain researchers are faced with protected networks that meet the strict regulations for medical data protection [13]. In the university clinics of Berlin for example, no outgoing connections are allowed except for http on port 80 and https on port 443. The university clinics of Marburg does not even allows https-connections by default, arguing that they could not check for data protection violations on encrypted data transfer. Given such examples, one can imagine how IT responsible react on requests of domain researchers to open several ports and large port ranges for different encrypted transfer protocols they are not familiar with. Our expe-
712
rience is, that exceptions are possible for people working in the medical informatics departments, as they are trusted to know deeply what they are doing, but colleagues from the clinical departments, who have direct access to patient data on their computers didn’t succeed. Envisioning wide adoption of HealthGrids, the communication with the end-user needs to employ the allowed routes, leading to plain web-based access.
J. Wu et al.
the process is aborted because of any Grid related error, or because of wrong usage [15, 16]. On the other hand end users are not interested in job details if everything works fine. Therefore a multilevel information system is useful, where overview information about the current jobs is given as well as detailed information about the individual jobs [17]. 1.1.4 Privacy
1.1.2 End-to-End Security End-to-end security in Grid environments implies authentication and authorization between all participating actors: users, resources, services and Grid components. The current method offered by Grid environments is GSI, based on PKI. One may refer to the several shortcomings of GSI—for example the person-based and not role-based or task-based security concept giving the full personal authorization delegation to Grid tasks. But most of these shortcomings could be solved with modifications within the framework of GSI—for example using usersigned service certificates. They do not question on principle the usage of PKI as security concept, which relies on mutual authentication and authorization by X.509 certificates. While certificates are commonly used, e.g. for server or application authorization, certificate user authentication is still not widely adopted. It can be assumed, that user certificates are getting more popular within the near future with digital document signing. But Grid certificate management is much more complicated than plain document signing, as the distributed Grid tasks need to delegate authorization, therefore require the referring private key of the provided credential.
A common method to fulfill data protection regulations is the anonymization of medical data. Anonymization in the context of medical data protection is defined as the irreversible removal of any possibly identifying data (IDAT) from the medical data (MDAT) [18]. Such data may then be made accessible for other people than the attending physician, e.g. for research or educational purposes. But in many cases the potential user of a Grid application requires reidentification of the results obtained within the distributed computing infrastructure; therefore irreversible anonymization is not an option [19]. The transport layer encryption offered by GSI-FTP [20] and other related transport protocols do not protect against privacy violation on the Grid nodes. But filebased encryption of the data is also not feasible, as today complex processing of encrypted data is only applicable for relative simple tasks. A possible solution is pseudonymization [21], where the IDAT part of the data is replaced by a key that might only be related to the original IDAT by the attending physician. Anyhow, as pseudonymization implicitly enables reidentification, processing in accordance with a data protection concept approved by data protection commissioners must be employed.
1.1.3 Monitoring of Grid Applications
1.2 Related Work
Grid applications are remote applications. One feature of Grids is that resources and services may be selected automatically upon their current availability. Due to the complexity of Grid infrastructures, the reliability of academic Grids is usually low. Resulting in frequent failures of Grid jobs [14] has very low control over the resources and services and generally have no overview, whether
Many Grid projects have already began to develop Grid portals, some of them are even very successful. The development of a Grid portal includes Grid user authentication, user task execution and workflow management. During the development of Grid web portal, a leak exists all the time when generating a proxy by direct access to a user’s public/private certificate
The Charité Grid Portal
key pair. Although it is technically feasible, yet it is not desirable for security reasons. Finally, people found out a solution through some projects such as GILDA/GENIUS portal [22]. A so called “robot” certificate may be issued for the Grid portal. Users can take good advantage of the portal to retrieve credentials from this single certificate for all Grid job submitting. Hence, another problem is caused: jobs submitted through different interfaces (e.g., portal and command-line) by the same user will be accounted to different end-entities, because each user interface uses its owner robot certificate. On the other hand, the benefits of robot certificate in the portal require additional trust steps in the Grid authentication infrastructure. Another option used in SHIWA [23] portal based on WS-PGRADE portal [24], is realized by short-term credentials (also known as proxy certificates), that give Grid tasks full user rights for only a limited time to minimize abuse. As such short-term credentials are difficult to combine with compute intensive jobs requiring automatic delegation over several days; mid-term credentials are stored on a “MyProxy server” [25], allowing for frequent renewal of the initial shortterm credential. Understanding the differences between user-credentials, the Myproxy-certificate and the short-term credential along with the usage of the different tools to create them was found to be quite confusing for non-expert computer users. Even more user-friendly GUI-based tools like the java application MyproxyUploader [26] still require a lot of manual configuration like the DN6 of the MyProxy server, apart from the fact that it also requires loose firewall settings. For the end-user it is also a complication to use an authentication infrastructure that is not integrated in the Grid portal authentication part. An integrated authentication component for Grid portals based on WS-PGRADE and Django [27] software has been developed, called GridCertLib [28], it uses leveraging a Shibboleth [29]-based authentication infrastructure and the SLCS [30] online certificate signing service. The approach taken in a Java applet solution the Grid Proxy Upload Tool (GPUT
713
[31], described in Section 2.2.5), instead, is integrated in the Grid portal and requires relatively less manual configuration and no firewall setting. There are currently even Grid portals for bioinformatics or biomedical research like the MoSGrid [32], the PopSciGrid [33], integrated with NCI’s cancer Biomedical Informatics Grid (caBIG® [34]) as well as GENIUS portal. But they rather focus on the usability and visualization of data, not on the data security and data protection when the data leave from the user workstation (described in Sections 1.1.1–1.1.4). The PL-Grid has developed a Grid portal7 with Vine Toolkit8 for Nano-Science Gateway [35]. The Adobe Flex9 and BlazeDs10 technologies are integrated into this toolkit to enable 3D visualization like the Nano Toolkit, but the user’s web browser may block this feature. Also, the Vine toolkit framework offers a Certificate Manager,11 either as Java Web Start or standalone application. But neither considers port-blocking by a firewall. A medical data secure transfer concept was implemented by the DECIDE project based on the research infrastructure of neuGrid [36], socalled Secure Storage System [37]. Medical data is encrypted with a random generated key stored in a key repository inside the users trusted environment and transported to the Grid environment. To increase usability without sacrificing security, we have implemented a Grid portal that provides an easy-to-use and secure access to all important services and information in the Grid for users as well as for application developers. The integrated application portlets offer the user options to enter and upload the parameters necessary for its purposes. In another work of EGEE production Grid [38] has also described about the strategy for optimizing a simple job submission. A workflow management service (GWES) [5] establishes a virtualization layer above the Grid middleware. This portal offers also a way to directly
7 https://portal.plgrid.pl/ 8 http://vinetoolkit.org/ 9 http://labs.adobe.com/technologies/flex/
6 Distinguished
tificate.
Name: identification information in a cer-
10 http://livedocs.adobe.com/blazeds/1/blazeds_devguide/ index.html 11 http://vinetoolkit.org/content/certificate-manager-1
714
retrieve or download the results of the Grid applications without consideration of the restriction of the clinical network.
2 Components and Services Used in Charité Grid Portal The Charité Grid Portal is a secure web gateway to Grid based medical applications, developed within different Grid projects. This gateway is located at the Network Demilitarized Zone (DMZ) of the Charité hospital network, which enables the communication with external systems like the D-Grid. Strict security policies of the hospital are worked around by this. This gateway server hosts several services of the PneumoGrid and MediGRID, for example web applications, data transfer and Grid workflow execution service and makes them available under the s-c04-miappl01.charite.de subdomain. It combines components from projects such as KWF-Grid [39], MediGrid, PneumoGrid and Gap-SLC.12 The infrastructure of the biomedical Grid projects PneumoGrid and MediGRID is similar. It currently consists of more than 5000 CPUs and about 1 PB data storage at 6 locations. Both projects are based on Globus Toolkit 4 [40] (GT4) with basic mechanisms like communication, authentication, network information and data access. For more information see [41]. 2.1 Portal Framework In contrast to earlier work, the originally used Gridsphere-based portal framework was replaced by a Liferay portal due to reliability and performance issues. The Liferay Portal framework is an open source framework widely used and comes with several portlets for collaboration. As Liferay is not specifically developed for Grid infrastructures and is focused on enterprise usage, maintain maintenance and further development is much more frequent.
J. Wu et al.
Several international Grid projects have moved to Liferay, such as TeraGrid13 and recently WSPGrade [24]. Despite JSR 16814 compliance of Gridsphere portlets, all Grid-specific functionality like credential handling needed to be redeveloped, as the functionality used built-in features of the Gridsphere core. The JSR 168 standard defines portlets as isolated entities, which makes integration of a single portlet in different portals easy. The successive JSR 28615 standard now allows for inter-portlet communication which enables for example credential delegation between different portlets. Therefore our portlets are compliant to the JSR 286 standard, on the cost that possible portlet dependencies need to be respected. 2.2 Reusable GUI Components and Used Services The software architecture is composed of four layers, as shown in Fig. 1. The top layer consists of the application portlets and the generic portlets (described on Section 3.1). They consist mainly of the different reusable GUI components. These components, shown in the second layer, realize the communication to the respective Grid services, located in the third layer. These services are provided by the differing D-Grid project partners, from middleware developers to resource providers. Some of the reusable GUI components are made available by the service providers; some are self-developed. We will now describe the employed components, along with the services and resources they are related to. 2.2.1 PACS Component The PACS component provides a generic interface to medical image management systems, so-called Picture Archiving and Communications Systems (PACS) [41] (see Fig. 5). PACS instances use the DICOM standard, the de-facto standard
13 https://www.xsede.org/wwwteragrid/archive/about.html 14 http://www.jcp.org/ja/jsr/detail?id=168 12 http://gap-slc.awi.de/
15 http://www.jcp.org/en/jsr/detail?id=286
The Charité Grid Portal
715
Fig. 1 Software Architecture of both projects: MediGRID and PneumoGrid
for both storing and transferring medical images. Within the MediGRID project, a GSI enhanced version of the DICOM protocol (GridDICOMservice) was developed and a workflow-based web-service for fault-tolerant data transfer (Reliable DICOM Transfer) between a GridPACS, populated with anonymized data and located in the demilitarized zone of the clinics, and the Grid nodes has been established. 2.2.2 Pseudonym Applet The pseudonym applet allows for transfer of local lung sound data via GSI-SSH to the Grid nodes along with pseudonymization using a key retrieved from the TempID-service. The applet removes all possibly identifying data and metadata found in audio files. Only the plain audio channels are transferred to the Grid. To assign the pseudonymized data to the original data, an encrypted xml-file is created and stored locally for reidentification by the attending physician.
The TempID service used in this applet is responsible for the pseudonymization keys that are unique during the processing time. These temporal IDs remain locked until released by the user. TempIDs can be retrieved via a generic web interface or via a SOAP/REST interface. 2.2.3 GWES Component The GWES component realizes the user interface to the Grid Workflow Execution Service (GWES). The GWES is a workflow management system, originally developed within the KWf Grid. It provides automatic and fault-tolerant job execution in Globus-based Grids, including data transfer between different Grid nodes [5]. The core of GWES is the XML-based Grid Workflow Description Language (GWorkflowDL) and Resource Description Language (D-GRDL). Anyhow, the formulation of workflow descriptions requires some knowledge of GWorkflowDL, which cannot be expected from domain researchers.
716
J. Wu et al.
Fig. 2 The screenshot of GPUT and credential management
Therefore an additional layer is required between the end-user and the workflow engine that automatically creates and submits workflows according to the user input. This has early been realized with application specific portlets, where several workflow templates are integrated with predefined software components and data flow. The user needs only to select the input data; the workflow description will be automatically generated from the template and then passed to the execution service. The GWES component employs the webservice interfaces (via SOAP or REST) offered by GWES. It provides an interactive web interface for monitoring, visualization and administration of workflows [42]. For example, the current status of a workflow can be simultaneously monitored. An audit trail plugin [43] is also developed by Fraunhofer IAO,16 which is used within the health care sector to enhance the data privacy and to monitor modification and data processing within clinical studies.
16 http://www.iao.fraunhofer.de/index.php
2.2.4 Data Management Component The data management component provides an interface to the data transfer services from the local environment to Grid nodes: GSI-FTP and GSI-SSH for secure, reliable and high-speed data transfer within Grids. GSI-FTP [20] is a GSI-based FTP protocol that can provide a secure, high-performance and reliable Grid data transfer and is optimized for highbandwidth wide-area networks. GSI-SSH [44] is a Grid Security Infrastructure (GSI)-enabled OpenSSH version, as GSIFTP and GridDICOM supports authentication and delegation of X.509 proxy certificates. It provides single sign-on remote login and file transfer service. We use the GSI-SSH to login to remote Grid nodes and transfer files between these servers without further user interaction. 2.2.5 Grid Proxy Upload Tool (GPUT) If you take a closer look to the credential management, many configurations can be preset for a specific Grid infrastructure or can be automat-
The Charité Grid Portal
ically retrieved by the information provided with the user credential. As the user credential is created within the browser, it stands to reason to use the browser’s certificate management rather than asking the user to setup a .globus directory, as many Grid clients expect. For this reason, we developed the Grid Proxy Upload Tool (GPUT) [31]. It is implemented as a Java applet and connects to a Java servlet at a backend server. GPUT (see Fig. 2) has substantially enhanced the transfer for proxy certificates to the MyProxy online credential repository (MyProxy server). The GPUT has the following features: – – –
–
Automatic format conversion of the user certificate on local PC Local proxy certificate creation Two-stage secure upload process: TLS encrypted transfer from the user client to the GPUT server (the portal), and subsequent MyProxy upload to the MyProxy Server. Easy combination with user authentication and credential management (Section 3.1.3) on a portal to minimize the user’s input effort.
3 Grid Portal Implementation The implementation was originally based on Liferay 5.2.3 and has recently migrated to Liferay version 6. Most portlets exist for both Liferay versions. 3.1 Generic Portlets 3.1.1 GWES Portlet Alternatively, the user has an option to use GWES services with a GWES command line client [42]. However, clinic user has 2 complicacies: First, the command line interface is not userfriendly. Secondly, the clinic user cannot use the services of GWES directly, because the GWES component tries to establish a connection to port 9081 which is often blocked in the clinic network. Within this portlet we will forward the user request to a gateway server (DMZ) and the GWES service will be started from there. The generic GWES portlet provides the opportunity to upload
717
workflow descriptions as well as to monitor all workflows uploaded by a user. Detailed information about the individual workflow components can be retrieved by clicking on the graphical representation. The portlet allows several levels of detail, such as the overall status of the workflow (INITIATED, ACTIVE, RUNNING, TERMINATED and COMPLETED), the start and completion time, the number of completed steps, up to date information about the individual workflow tasks (see Fig. 3). Such detail is not required all day; it is especially helpful in failure cases since the users can provide this information to the support teams if they cannot resolve the problem by themselves. The status of a workflow submitted using the portlet is only visible to the workflow owner and the administrator. Users have the options to start or resume, suspend or abort selected workflows interactively. 3.1.2 Data Management Portlet The data management portlet offers user access to the remote file systems on the Grid nodes. Data transfer can be enabled between the local computer and the Grid, especially for users behind strict firewalls. For uploading files, they are first transferred via a TLS-encrypted connection to the gateway server (DMZ). Then the files are transferred from the portal server via GSI-SSH or GSI-FTP (corresponding port 2222 or 2811) to the chosen Grid node. Downloading files can be achieved in a similar manner. With this portlet, the user can upload raw data and, download processing results without the need of further software clients or knowledge of Linux console handling. We are aware that using the generic portlet, the user has to care in advance about data protection measures. On-the-fly pseudonymization is only available in applicationspecific portlets, where the input data is welldefined. 3.1.3 Credential Management Portlet To ease the credential handling, the distinguished name (DN) of the user is extracted from client authentication during the TLS handshake. Before logging in, the user is asked to authenticate with
718
J. Wu et al.
Fig. 3 Screenshot of GWES portlet. A list of all initiated workflows and the detailed status of the selected workflow are shown here
his or her certificate from the browser’s keystore. This DN is stored in the portal database when the user logs in (Fig. 4 step 1) and can then be used for credential retrieval from the MyProxy server. As the used MyProxy server is preconfigured, the only input required from the user is the MyProxy passphrase. The credential is then persistently stored in the portal database (Fig. 4 step 6) and can be passed to the various Grid services (Fig. 4 step 7) initialized via the other portlets. From time to time, the mid-term MyProxycredential needs to be renewed. This can easily be done using the credential management portlet, which basically contains the GPUT applet (Fig. 4 step 2). The user needs to give the location of his or her user credential and the password for the private key (Fig. 4 step 3). The certificate might be stored in different formats (.pem or .p12) and different locations (browser’s keystore or local file system). For the generation of the proxy credential, the user further needs to give a password. As mentioned before, the MyProxy server is preconfigured and cannot be changed in one instance, but the proxy and credential lifetimes can be chosen freely. The proxy certificate is then
created locally and is send via an encrypted channel to the gateway server located in Charité DMZ (Fig. 4 step 4), which then pass it to the Myproxy server (Fig. 4 step 5). So all certificate-related user tasks can be realized within the portal. The only requirements are the user certificate being stored in the browser’s keystore and an https connection. 3.2 Application Portlets 3.2.1 FSL-SIENAX Portlet The FSL-SIENAX portlet provides tools for brain segmentation with the FSL SIENAX method [45]. The portlet is divided into different panels (see Fig. 5): The “Start” panel provides status information about initialized workflows. The “File management” panel includes the PACS component. Users can select certain image data in this component as input to the SIENAX segmentation process in the Grid. The patient data will be anonymized (patient relevant information such as patient-ID are removed from DICOM-header) transferred to GridPACS. The workflow is initiated by pressing the “Run workflow” button in
The Charité Grid Portal
Fig. 4 The workflow of the credential handling
Fig. 5 The screenshot of FSL-SIENAX portlet
719
720
the next panel, where the details about the input parameter of the workflow is displayed again. The user can also check the workflow status whether it is completed and the results are available to download. The GWES workflow visualization component is also integrated into the portlet, to allow monitoring of THE SIENAX-jobs without using the generic portlet.
J. Wu et al.
2.
3.2.2 Lung Sound Analysis Portlet This portlet is implemented for Grid-based lung sound analysis of chronic obstructive pulmonary disease (COPD) patients in the frame of the PneumoGrid [46] project. The data to be analyzed are lung sound data that are recorded with the Pulmotrack [47] system. This system offers wheeze identification, characterization and quantification. The Grid services perform further data analysis, as for example pattern matching with collected reference data and complex biosignal analysis [48]. The portlet uses the TempID service, the pseudonym applet and the data management component. We describe the entire process in seven steps: 1. Grid service initialization: The user presses the Get New TempID Button. The portlet
Fig. 6 The screenshot of lung sound analysis portlet
3.
4.
5.
calls the TempID service. When the TempID is retrieved, the pseudonym applet is started (see Fig. 6). Local pseudonymization and data transfer to the gateway using the pseudonym applet. A file browser opens that shows only the respective audio-files (in WM6-format). The user selects the datasets to be processed. After confirming the file selection, the pseudonymization starts. In an xml-file is logged which TempID is assigned to which record. The prepared audio-files are transferred to the gateway into a specific directory using GSI-SSH (see Fig. 6). Data transfer to the Grid: As soon as the local pseudonymization is finished, a workflow template is completed and stored together with the data on the gateway. All data on the gateway are transferred, encompassing all audio-files and the workflow description. The receiving site it a secured Grid. Start of the Grid workflow: When the transfer is completed, the workflow description is uploaded to the GWES and started, and the results are transferred back to the gateway. Local reidentification by the pseudonym applet.
The Charité Grid Portal
6. Release of the TempID: When the user has retrieved all results in the local environment, he or she enters the TempID into the the Release Locked TempID field and presses the button. The portlet then calls the TempID service and releases the TempID to the pool for reuse.
4 Result of Usability Case Study As a part of this work we have carried out a small case study which was performed to get user feedback on the usability of our portal solution. We make use of subjective rating scales (ranging from 1=very good to 5=insufficient) to evaluate it on the basis of following criteria, addressing the four main usability issues as presented in Section 1.1: C1: How efficiently is the user interface? Does it require additional preparation time: e.g. installation of software, network administrator required to open network ports of enables special transport protocols or extra training to use it? C2: How complicated is the use of our credential management solution in comparison the MyProxy Uploader as alternative tool? How many additional knowledge about certificates is required to manage the Grid authentication? C3: How easy is the user interface to submit a job for execution the FSL and lung sound
Fig. 7 Usability case study result
721
analysis application? Is it necessary to be familiar with the workflow description language (XML based)? C4: How easy is it the reidentification of the lung sound application patient data after they are processed in the Grid? Twenty two people have participated in this case study. Eight people are biomedical researcher. Three of them have experience of Grid infrastructure. Six are computer scientists. Three of them work in medical domain. They deal with science portals, but only two of them have already experience with Grid infrastructures. The remaining 8 people have no IT background, and in particular no deeper knowledge about Grid infrastructures. They can be seen as untrained domain researchers. All participants are academic postgraduates, age ranges from 25–55. The case study result is shown in Fig. 7. The result indicates a good user acceptance. Users don’t need administration rights or any special firewall setting on her or his workstation. A few input operations (by FSL application 4 times, by lung sound analysis application 3 times) are sufficient to submit a job. Special knowledge about the XML format is not required. On the other hand, the result shows that it is still difficult for the user to understand the certificate and credential mechanism. From user comments it could be deduced that they can’t differentiate between certificate password and MyProxy password. The management of many passwords tends to confuse the user. To reidentificate the patient data, the user
722
must remind the TempID stored in her or his workstation, which was the main criticism on C4.
5 Conclusions and Outlook In order to minimize specific security problems of the life sciences community and to increase functionality and usability, we have successfully integrated existing and newly developed components in a Grid Portal. Accessing Grid services over a web portal is a promising solution for domain researchers in protected environments and without specific knowledge of the Grid [49]. The portal provides—with both the generic and application-specific portlets—a platformindependent and relatively user-friendly interface to several Grid services and resources. The certificate-based security infrastructure fulfills, in many areas, the requirements of data protection legislation and it has been integrated into relevant services and applications. The operation upon credentials has been implemented as user-friendly as possible, so that it is at least easier to use for users who don’t have much experience with certificates. Even if much progress has been achieved within the last years, there are still many areas for improvement. In particular interactive visual inspection and correction of intermediate results are under current development. Such requirements are often found in complex image processing on clinical data, and data transfer of intermediate results is highly inefficient due to large data sets of several GByte. But recent developments in html517 to ease rich internet applications, and the general trend to web applications, offers several opportunities in functionality and usability of Grid portals. Acknowledgements We thank all collaborators from D-Grid for strong support. This work is supported by the German Federal Ministry of Education and Research (BMBF) within the PneumoGrid (011G09002A-H) Project as part of the D-Grid initiative, Somnonetz (01EZ1132), the ITEA2-project easiClouds (01IS11021F); and the SHIWA project (European Commission’s FP7 INFRASTRUCTURES-2010-2 grant agreement n◦ 261585).
17 http://www.html5rocks.com/en/
J. Wu et al.
References 1. Breton, V., Dean, K., Solomonides, T., et al.: The healthGrid white paper. Stud. Health Technol. Inform. 112, 249–321 (2005) 2. Olabarriaga, S.D., Glatard, T., de Boer, P.T.: A virtual laboratory for medical image analysis. IEEE Trans. Inf. Technol. Biomed. 14, 979–985 (2010) 3. Camarasu-Pop, S., Glatard, T., Benoit-Cattin, H., Sarrut, D.: Enabling Grids for GATE Monte-Carlo radiation therapy simulations with the GATE-Lab. In: Applications of Monte Carlo Method in Science and Engineering (2011) 4. Kottha, S., Peter, K., Stenke, T., Bart, J., Falkner, J., Weisbecker, A., Viezens, F., Mohammed, Y., Sax, U., Hoheisel, A., Ernst, T., Sommerfeld, D., Krefting, D., Vossberg, M.: Medical image processing in MediGRID. In: Proceedings of the German e-Science Conference, Baden-Baden (2007) 5. Hoheisel, A.: Grid-workflow-management. In: Pfreundt, F.-J., Linden, J., Weisbecker, A., Unger, S. (eds.) Fraunhofer Enterprise Grids—Software. Fraunhofer IRB Verlag, Stuttgart (2008) 6. Kaufman, L.M.: Data security in the world of cloud computing. IEEE Secur. Priv. 7(4), 61–64 (2009) 7. Krieger, S., Leitner, M.T.: Grid Gateway—Accessing Grid resources from private networks, vol. 3. Austrian Grid Symposium. RISC-Linz Report Series No. 09–14 (2009) 8. Alfierie, R., Cecchini, R., Ciaschini, V., dell’Agnello, L., Frohner, A., Lörentey, K., Spataro, F.: From Gridmap-file to VOMS: managing authorization in a Grid environment. Future Gener. Comput. Syst. 21, 549–558 (2005) 9. Beck-Ratzka, A., Büchner O., et al.: Grid Resource Registration Service (in German). [Online] 13 Jan 2012. [Cited: 08 Aug 2012]. http://www.d-grid.de/ fileadmin/user_upload/documents/Kern-D-Grid/Betrie bskonzept/D-Grid-Betriebskonzept.pdf 10. Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture for computational Grids. s.l: In: Proceedings of 5th ACM Conference on Computer and Communications Security Conference, pp. 83–92 (1998) 11. Adams, C., Lloyd, S.: Understanding PKI: Concepts, Standards, and Deployment Considerations. s.l.: Addison-Wesley Professional. ISBN 978-0-672-32391-1 (2003) 12. Gutman, P.: Engineering Security (Book Draft). [Online] 2011. [Cited: 08 Aug 2012]. http://www.cs. auckland.ac.nz/∼pgut001/pubs/book.pdf 13. Helbing, K., Demiroglu, S.Y., Rakebrandt, F., Pommerening, K., Rienhoff, O., Sax, U.: A data protection scheme for medical research networks. Review after five years of operation. Methods Inf. Med. 49(6), 601–607 (2010) 14. Plankensteiner, K., Prodan, R., Fahringer, T., Kertész, A., Kacsuk, P.: Fault-tolerant behavior in stateof-the-art. CoreGRID Technical Report, 18 Oct 2007
The Charité Grid Portal 15. Schopf, J.M.: Ten actions when Grid scheduling: the user as a Grid scheduler. In: Schopf, J.M., Weglarz, J., Nabrzyski, J. (eds.) Grid Resource Management. Kluwer Academic Publishers Norwell, MA, USA, pp. 15–23 (2004) 16. Andreetto, P., Andreozzi, S., Ghiselli, A., Marzolla, M., Venturi, V., Zangrando, L.: Standards-based job management in Grid systems. J Grid Computing 8(1), 19–45 (2010) 17. Wieczorek, M., Hoheisel, A., Prodan, R.: Taxonomies of the multi-criteria Grid workflow scheduling problem. In: Talia D., Yahyapour R., Ziegler W. (eds.) Grid Middleware and Services: Challenges and Solutions, vol. III, pp. 237–264. Springer, New York (2008) 18. Krefting, D., Wu, J., Hoheisel, A., Siewert, R., Sebert, M., Canisius, S., Drepper, J.: A generic data protection concept for distributed medical image and biosignal processing. In: Proceedings of the HealthGrid Conference, Bristol (2011) 19. Tinabo, R., Mtenzi, F., O’Driscoll C., O’Shea, B.: Anonymisation vs. Pseudonymisation: which one is most useful for both privacy protection and usefulness of e-healthcare data. In: Proc. IEEE 4th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 1–6 (2009) 20. Bresnahan, J., Link, M., Khanna, G., Imani, Z., Kettimuthu, R., Foster, I.: Globus GridFTP: what’s new in 2007 (Invited Paper). s.l: In: Proceedings of the First International Conference on Networks for Grid Applications (GridNets 2007) (2007) 21. Heurix, T., Neubauera, J.: A methodology for the pseudonymization of medicaldata. Int. J. Med. Inf. 80(3), 190–204 (2011) 22. Barbera, R., Donvito6, G., Falzone, A., La Rocca, G., Milanesi, L., Maggi, G.P., Vicario, S.: The GENIUS Grid Portal and robot certificates: a new tool for e-Science. [Online] 10(Suppl 6), S21, 16 June 2009. [Cited: 08 August 2012]. http://www.biomedcentral. com/1471-2105/10/S6/S21 23. SHIWA: SHIWA. SHaring Interoperable Workflows for large-scale scientific simulation on Available DCIs. [Online] SHIWA. [Cited: 08 Aug 2012]. http://www. shiwa-workflow.eu/ (2010) 24. Kacsuk, P.: P-GRADE portal family for Grid infrastructures. J. Concurr. Comput.: Practice & Experience (2011) 25. Novotny, J., Tuecke, S., Welch, V.: An online credential repository for the Grid: MyProxy. s.l.: In: IEEE Proceedings of the Tenth International Symposium on High Performance Distributed Computing (HPDC10), pp. 104–111 (2001) 26. Meredith, D., Rogers, W.: Certificate Management Wizard (MyProxy Uploader) Tutorial. NGS. [Online] [Cited: 08 Aug 2012]. http://www.ngs.ac.uk/tools/ certwizard/tutorial 27. Django: Django. Django. The Web framework for perfectionists with deadlines. [Online] Django. [Cited: 08 Aug 2012]. https://www.djangoproject.com/ 28. Murri, R., Kunszt, P.Z., Maffioletti, S., Tschopp, V.: GridCertLib: a single sign-on solution for Grid web
723
29. 30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
applications and portals. J. Grid Computing 9(4), 441– 453 (2011) Shibboleth: Shibboleth. [Online] [Cited: 08 Aug 2012]. http://shibboleth.net/index.html SWITCH. Description of the SLCS. [Online] [Cited: 08 Aug 2012]. http://www.switch.ch/grid/slcs/ about/about_long.html Falkner, J.: Einfache Nutzung von Zertifikaten— Grid Proxy Upload Tool (GPUT) (in German). GapSLC Abschluss-Workshop. 09 June 2011. [Online] [Cited: 08 Aug 2012] http://gap-slc.awi.de/documents/ abschlws/falkner_gput.pdf Birkenheuer, G., Blunk, D., Breuers, S., Brinkmann, A., dos Santos Vieira, I., Fels, G., Gesing, S., Grunzke, R., Herres-Pawlis, S., Pachschies, L., Schäf P., et al.: MoSGrid—a molecular simulation Grid as a new tool in computational chemistry, biology and material science. J. Cheminform. 3(Suppl 1), P14 (2011) Ding, L., Lebo, T., Erickson, J.S., DiFranzo, D., Williams, G.T., Li, X., Michaelis, J., Graves, A., Hendler, J., et al.: TWC LOGD: a portal for linked open government data ecosystem. Journal of Web Semantics: Science, Services and Agents on the World Wide Web (2011) Komatsoulis, G.A.: Collaboration in cancer research community: cancer Biomedical Informatics Grid (caBIG). In: Hupcey, M.A.Z., Williams, A.J., Ekins, S. (eds.) Collaborative Computational Technologies for Biomedical Research. s.l.: Wiley (2011) Dziubecki, P., Grabowski, P., Krysiñski, M., Kuczyñski, T., Kurow-ski, K., Szejnfeld, D.: Modern portal tools and solutions with Vine Toolkit for science gateways. In: Proceedings of International Workshop on Science Gateways for Life Sciences (2011) Redolfi, A., McClatchey, R., Anjum, A., Zijdenbos, A., Manset, D., Barkhof, F., Spenger, C., Legré, Y., Wahlund, L., Barattieri di San Pietro, C., Frisoni, G.: Grid infrastructures for computational neuroscience: the neuGRID example. Future Neurol. 4(6), 703–722 (2009) Ardizzone, V., Barbera, R., Calanducci, A., Fargetta, M., La Rocca, G., Monforte, S., Pistagna, F., Rotondo, R., Scardaci, D.: The DECIDE science gateway. In: Proceedings of International Workshop on Science Gateways for Life Sciences (2011) Lingrand, D., Montagnat, J., Martyniak, J., Colling, D.: Optimization of jobs submission on the EGEE production Grid: modeling faults using workload. J Grid Computing 8(2), 305–321 (2010) Bubak, M., Unger, S.: K-Wf Grid—The Knowledgebased Workflow System for Grid Applications. s.l.: In: Proceedings of Cracow Grid Workshop. [Online] [Cited: 08 Aug 2012] http://www.gridworkflow.org/ kwfgrid/docs/K-WfGrid_PROCEEDINGS.pdf (2006) Foster, I.: Globus Toolkit version 4: software for service-oriented systems. In: IFIP International Conference on Network and Parallel Computing, pp. 2–13 (2006) Krefting, D., Bart, J., Beronov, K., Dzhimova, O., Falkner, J., Hartung, M., Hoheisel, A., Knoch, T.A.,
724
42.
43.
44.
45.
J. Wu et al. Lingne, T., et al.: MediGRID: towards a user friendly secured Grid infrastructure. Futur. Gener. Comput. Syst. 25(3), 326–336 (2009) Hoheisel, A., Linden, T., Nowakowsk, P.: GWES User Manual. The Generic Workflow Execution Service. [Online] [Cited: 08 Aug 2012]. http://www.gridwork flow.org/kwfgrid/gwes-web/KWF-WP2-D2-FIRSTGWESUserManual.pdf Fraunhofer FIRST: Introduction to the Audit Trail Plugin. GWES. [Online] Fraunhofer FIRST, 31 Aug 2011. [Cited: 08 Aug 2012]. http://www.gridworkflow. org/kwfgrid/gwes-plugin-audit-trail/index.html University of Chicago: GSI-OpenSSH. Globus. [Online] Globus Alliance. [Cited: 08 Aug 2012]. http://www.globus.org/toolkit/docs/5.0/5.0.0/security/ openssh/ Smith, S.M., Zhang, Y., Jenkinson, M., Chen, J., Matthews, P.M., Federico, A., De Stefano, N.: Accurate, robust and automated longitudinal and cross-
46.
47.
48.
49.
sectional brain change analysis. NeuroImage 17(1), 479–489 (2002) Canisius, S., Hoheisel, A., Penzel, T., Gross, V., Ploch, T., Krefting, D.: PneumoGRID—telemedizin und GRID technologie zur visualisierung der lungenbelüftung bei COPD (In German). In: Proceedings of Telemedizin. Berlin (2009) AccuraMed: Pulmotrack®—Respiratory Acoustic Monitor. AccuraMed. [Online] [Cited: 08 Aug 2012]. http://accuramed.be/?page_id=192 Krefting, D., Canisius, S., Hoheisel, A., Loose, H., Tolxdorff, T., Penzel, T.: Grid based sleep research— analysis of polysomnographies using a Grid infrastructure. Future Gener. Comput. Syst. (2010). doi:10.1016/ j.future.2010.03.008 Shahand, S., Santcroos, M., Mohammed, Y., Korkhov, V., Luyf, A., van Kampen, A., Olabarriaga, S.: Frontends to biomedical data analysis on Grids, Bristol, UK. In: Proceedings of HealthGrid (2011)