Another interesting solution is the Web service (WS) security architecture, a layered ... protocols (WS-Policy, WS-Trust, WS-Privacy, WS-Secure Conversation, ...
Journal of Grid Computing manuscript No. (will be inserted by the editor)
GS3 : a Grid Storage System with Security Features V. D. Cunsolo · S. Distefano · A. Puliafito · M. Scarpa
Abstract Technological trend and the advent of worldwide networks, such as the Internet, made computing systems more and more powerful, increasing both processing and storage capabilities. In Grid computing infrastructures, the data storage subsystem is physically distributed among several nodes and logically shared among several users. This highlights the necessity of a) availability for authorized users only, b) confidentiality, and c) integrity of information and data: in one term security. In this work we face the problem of data security in Grid, by proposing a lightweight cryptography algorithm combining the strong and highly secure asymmetric cryptography technique (RSA) with the symmetric cryptography (AES). The proposed algorithm, we named grid secure storage system (GS3 ), has been implemented on top of the Grid file access library (GFAL) of the gLite middleware, in order to provide a file system service with cryptography capability and POSIX interface. The choice of implementing GS3 as a file system, the GS3FS, allows to protect the file system structure also, and to overcome the well-known problem of file rewriting in gLite/GFAL environments. In the specification of the GS3FS, particular care is addressed on providing a usable user interface and on implementing a file system that has low impact on the middleware. The final result is the introduction of a new storage Grid service into the gLite middleware, whose overall characteristics are never offered before, at the best of authors’ knowledge. The paper describes and details both the GS3 algorithm and its implementation; the performance of such implementation are evaluated discussing the obtained results and possible application scenarios in order to demonstrate its effectiveness and usefulness. Keywords Integrity; Confidentiality; File System; Grid; gLite; GFAL.
1 Introduction Current IT trend definitely moves towards network-distributed computing paradigms. Among them, the Grid [12,13,33,11] is one of the most successful, due to its capability of managing large amounts of computing and storage resources transparently to users. These latter only Universit`a di Messina, Dipartimento di Matematica Contrada Papardo, S. Sperone, 98166 Messina, Italy E-mail: vdcunsolo{sdistefano, apuliafito, mscarpa}@unime.it
2
have to specify their jobs’ requirements and then the Grid system manager automatically determines where they are executed and which resources have to be allocated [33]. Sharing data in distributed multi-user environments triggers problems of information security concerning data confidentiality and integrity. Several works concerning Grid data management [34,40,7] recognize and identify data security and data privacy as major issues. Grid middlewares usually provide resources management’s capabilities, ensuring security on accessing services and on communicating data, but they often lacks of data protection from direct malicious accesses at system level. In other words, the fact that data are disseminated and stored in remote distributed machines, directly accessible from their administrators, constitutes the main risk for data security in Grid environment. Insider attacks [47,54, 53, 7] are often not adequately covered in Grid context. It is therefore mandatory to introduce an adequate data protection mechanisms, which denies data intelligibility to unauthorized users, even when they are (local) system administrators. The goal of our work is to provide a mechanism capable to store data in Grid environment in a secure way, the Grid secure storage system (GS3 ). In order to do that, we propose to combine both symmetric and asymmetric encryption. Therefore, the main contribution of this paper is the specification of a lightweight and effective technique for secure data storage in Grid environment, that conjugates high security goals with performance issues. The GS3 technique we propose has been implemented into the gLite middleware in order to demonstrate the feasibility of the approach, supported by the encouraging results obtained in terms of performance. Other interesting contributions of GS3 to the state of the art are: 1) the organization of the Grid data into a file system, named GS3FS, 2) the protection of both data/files and file system structure, and 3) the introduction of the capability of file rewriting in gLite storage systems, not yet implemented by the gLite grid file access library (GFAL) [37]. Moreover, in order to further demonstrate the feasibility of the proposed technique, we discuss about the possible application scenarios of GS3 , trying to fix some guidelines and, possibly, a methodology, for adapting it to real word applications, with particular regard to those organizations that implement information classification. The paper is organized as follows: after a short introduction of background concepts in section 2, we describe, from an high level point of view, the GS3 algorithm (section 3), and its implementation into the gLite middleware (section 4). In section 5 we focus on the GS3 file system implementation, providing details on files’ organization, access and management. Then, in section 6 the results obtained by evaluating our implementation are discussed. Possible application scenarios for GS3 are investigated in section 7. Finally, section 8 proposes some remarks, through a discussion on benefits and drawbacks of the GS3 technique and possible future extensions.
2 State of the Art and Background 2.1 Security Information or data security helps to ensure privacy and to protect personal data. IEEE defines data security as “the degree to which a collection of data is protected from exposure to accidental or malicious alteration or destruction” [29]. The International Standard Organization specified information security in the ISO/IEC 27002 standard [30]. It provides best practice recommendations on information security management for those who are responsible for initiating, implementing or maintaining Information Security Management Systems (ISMS). More specifically, information security is defined within the standard
3
as “the preservation of confidentiality (ensuring that information is accessible to the authorized users only), integrity (safeguarding the accuracy and completeness of information and processing methods) and availability (ensuring that authorized users have access to information and associated assets when required)”. Several techniques and technologies have been specified in literature in order to achieve data security, that can be classified and summarized into four classes: masking, backup, erasure and encryption. The masking of structured data is the process of obscuring specific data within a database table or cell to ensure that data security is maintained and sensitive customer information is not leaked outside of the authorized environment. Backup techniques refer to making copies of data so that the additional copies may be used to restore the original after a data loss event, in order to improve the reliability and the availability of data and, consequently, their integrity. Data erasure ensures complete destroy of all data so that no sensitive data is leaked in case of data deletion or when an asset is retired or reused. Data encryption refers to cryptography techniques that encrypts data on a storage device. Encryption typically takes form in either software or hardware and it is based on the cryptography theory. Cryptography is the study of mathematical techniques related to aspects of information security such as confidentiality, data integrity, entity authentication, and data origin authentication [35]. In recent times it is considered a branch of both mathematics and computer science, and it is closely affiliated with information theory, computer security, and engineering. There are two basic types of cryptography systems: symmetric (also known as conventional or secret key) and asymmetric (public key). Symmetric ciphers require both the sender and the recipient to have the same key. This key is used by the sender to encrypt the data, and again by the recipient to decrypt the data. The most widely used symmetric cryptography algorithm is the advanced encryption standard (AES) [48], also known as Rijndael. It is a block cipher adopted as an encryption standard by the U.S. government and developed by two Belgian cryptographers: Joan Daemen and Vincent Rijmen. With asymmetric ciphers each user has a pair of keys: a public key and a private key. Messages encrypted with one key can only be decrypted by the other key. The public key can be published, while the private key is kept secret. One of the most interesting asymmetric cryptography algorithm is the RSA [42], developed in 1977 by Ron Rivest, Adi Shamir and Lan Adleman at MIT. Asymmetric ciphers are much slower, and their key sizes must be much larger than those used with symmetric cipher. At the moment, to break both the AES and the RSA algorithms only the brute force attack is effective, but it requires great power computing and long elaboration time to obtain the key, especially in the latter case. An interesting technique that combines and synthesizes the high security of asymmetric cryptography algorithms with the efficiency of the symmetric approach is PGP (Pretty Good Privacy) [16]. In PGP data are encrypted by using a symmetric cryptography. Then, in order to secure the symmetric key, an asymmetric cryptography algorithm is applied, since this ensures high security. An algorithm similar to PGP has been developed by GNU in the GPG (GNU Privacy Guard) open source project [20]. With regard to security issues in Grid computing environment, a taxonomy is represented in Fig. 1 [8,7]. This shows how complex and articulated is the problem of Grid security at different levels (host, architecture, infrastructure, virtual organization), involving several stakeholders (users, administrators, resources, networks). Problem that has been mainly addressed by separately considering the different issues, then aggregating the corresponding solutions into specific frameworks. A plenty of solutions have been thus conceived. The Grid
4
Fig. 1 Taxonomy of Grid computing security issues.
Security Infrastructure (GSI) [23], developed independently and later integrated as part of the OGSA standards, addresses all the stated architectural concerns. GSI is based on proven standards such as public key encryption, X.509 certificates, and the Secure Sockets Layer (SSL) and enables secure authentication and communication over computer networks. The latest version of the GSI, based on the Globus Toolkit 4.0, also allows Web services based security. Another interesting solution is the Web service (WS) security architecture, a layered structure built on WS-Security [39,43]. WS-Security is a set of standards that describe the security mechanisms in a Web services scenario through the extensions of the SOAP header to provide message integrity and confidentiality. WS-Security is flexible and it is designed to be used as the basis for the construction of a wide variety of security models including public key infrastructure (PKI) [42], Kerberos, and SSL. Specifically, WS-Security provides support for multiple security tokens, multiple trust domains, multiple signature formats, and multiple encryption technologies. It provides mechanisms for propagation of security tokens, message integrity and message confidentiality. These mechanisms by themselves do not provide a complete security solution. Instead, WS-Security is a building block that can be used in conjunction with other Web service extensions and higher-level application-specific protocols (WS-Policy, WS-Trust, WS-Privacy, WS-Secure Conversation, WS-Federation, WS-Authorization) to accommodate a wide variety of security models and encryption technologies. An interesting project on the topic, coded as GridSec [26], mainly focuses on trusted Grid computing with dynamic resources and automated intrusion responses. The project develops a self-configuration security and privacy framework to support trusted Grid applications. The GridSec architecture gives early warning to prevent system failures in grid resource sites from massive cyberspace attacks over the Internet. Other interesting readings on the topic can be found in [7,55]. Among the security topics dealt with such techniques, information security has great relevance as also highlighted in Fig. 1. As stated in [8], the concerns at the information security level of the Grid can be broadly described as issues pertaining to secure communication, authentication, and issues concerning single sign on and delegation. Even though Fig. 1 also reports confidentiality and integrity, such aspects are, usually, partially or totally ignored by Grid middlewares. Thus, they lack of protection mechanisms from insider attacks, that are common vulnerabilities of Grid computing environments, particularly felt in storage Grids. The problem of a Grid secure storage has been mainly faced in literature as definition of access rights [32], in particular addressing problems of data sharing, whilst the coding of the data is demanded to the user, since no automatic mechanism to access a secure storage space in transparent way has been defined.
5
In [45], the authors propose a technique for securing data disseminated over Grid gLite environment based on symmetric cryptography (AES). The key security is entrusted to a unique keystore server that stores it, to which all the data access requests must be notified in order to decrypt the data. This algorithm implements a spatial security policy: the security lies in physically hiding and securing the keystore server, and the access to the keystore is physically restricted and monitored in order to protect from malicious users, external attacks and insider abuses. Shamir in [46] studied in depth the problem of data access in distributed environments, proposing a solution based on symmetric keys. The secret sharing scheme splits the (AES) symmetric key in n parts, that can be recomposed if and only if at least k ≤ n parts are available (k, n threshold scheme). Moreover, in order to prevent unauthorized accesses to the symmetric key the author proposes to subdivide it on different servers named key-servers. In this way, the system is both resistant to attacks and reliable since n − k + 1 key-servers have to be compromised or have to fail in order to make the key unavailable and consequently data unaccessible. A similar technique has been specified by Brunie et al. in [6], also used in Perroquet [5] to modify Parrot [51], a tool for attaching existing programs to remote I/O systems through the filesystem interface, by adding an encrypted file manager. The main contribution of such work is that the (AES) symmetric key is decomposed in n trunks and it can be rebuilt if and only if all the n parts are available. HYDRA [27] implements a data sharing service in gLite 3.0 medical environments, securing data by using the symmetric cryptography and splitting the keys among keystore servers [38]. HYDRA provides controlled access to the encryption key (through certificate DN-based ACLs, similarly to what is done for files) and secured communication to the requester by implementing the Shamir secret sharing scheme on the keystore servers. All the above mentioned proposals are based on symmetric cryptography. Most of them implement keys splitting algorithms. The underlying idea of the key splitting approach is that at least a subset of the systems (key servers) over which the keys are distributed will be trustworthy. This approach well addresses data-sharing and reliability/availability issues but it is weak from both the security point of view, since the list of servers with key parts have to be adequately secured, also from system administrators that can always access the keys, and the performance point of view, since there is a significant overhead to rebuild a key, depending on the number of parts in which it is split. These are serious problems to adequately address in an alternative way, in fact, even though higher security could be achieved by increasing the number of key parts and consequently of keystores (anyway vulnerable to insider abuses), this should have a heavy impact on performance. A better solution is required.
2.2 File Systems Many decades of research in distributed file systems provided a plenty of optimized solutions (network file system [44], Andrew file system [25], GPFS [28], Z file system [36], Google file system [17], LUSTRE [49], etc.). In Grid context, the idea of specifying a filesystem specifically conceived for data and storage Grids is widely recognized and had quite success in the specific literature. The Global Grid Forum (GGF) have invested several efforts into the specification of a Grid file system and a POSIX-like I/O interface, by creating a working group specifically focused on the topic [18]. Such efforts were concretized in a document detailing the design issues, requirements and status of the standard Grid File
6
System Architecture [19]. At now no specific implementation of such file system has been documented. An interesting document that confirms and outlines the need for a complete Grid filesystem to support file manipulation at Grid and inter-Grid levels is reported in [34]. It identifies the main characteristics a Grid filesystem has to provide, also presenting a basic first prototypal implementation of such a filesystem and its performance evaluation. Among them, primary importance is attributed to security of data and files. There are several attempts to adapt existing distributed file systems to Grid environments. Systems such as GridNFS [24], SGFS [59] and WOW [15] have augmented NFS adding security and clustering capabilities for use on the Grid. GPFS-WAN [3] implements a file system interface to several supercomputer centers from nodes on the NSF TeraGrid. The Avaki Data Grid [2] uses a federated approach to serve several file systems to clients. A Share Server works as a data source to a Data Grid Access Server, which, in turn, acts as a local NFS server that can be mounted on client machines. A similar approach is followed by GFarm [50], LegionFS [22] and Ceph [58], that provide a file system constructed from multiple custom storage devices, also available in Grid environments. The alternative on starting from existing implementations is to develop a file system from scratch. This allows to customize the features of the filesystem, optimizing it according to user requirements. An example is GridFS [4] that, based on the Object based Storage Architecture model and MPI I/O, implements an high performing filesystem eliminating most access overheads and optimizing metadata by using Direct Parallel I/O, also addressing problems of distributed cache coherence. SlashGrid [21] is a framework for Grid-aware file providing file access management based on Grid PKI certificates. Data are accessible both from standard Grid user interface by using Grid VO certificates to authorise access to local files, or by Web (read-only). In the latest version the authors aim at providing transparent, secured access to bulk storage on neighboring machines in storage farms, by using users’ X.509 proxy certificate and VOMS credentials when reading/writing to remote files. Chirp [52] allows an ordinary user to easily deploy, configure, and harness distributed storage without requiring any kernel changes, special privileges, or attention from the system administrator at either client or server. In terms of user usability, an interesting approach is to base on the File system in USEr space (FUSE) [1] library. The FUSE user-space daemon traps and redirects system calls made by the higher level file systems to the Linux VFS. The already cited GFarm and SlashGrid, and ELFI [10], a filesystem interface to both the replica catalog and the gLite/LCGEGEE SE, implement such approach.
3 GS3 : The Grid Secure Storage System The main goal of this paper is to achieve data security in storage Grids specifically conceived for providing users with access to a huge amount of storage space. In such context, data confidentiality and integrity must be pursued avoiding both outsider and, in particular, insider attacks: no one except authorized users, including system administrators, can access the contents of sensitive information. In order to achieve data security, the best solution is cryptography. As discussed in section 2, the most successful approach adopted is the symmetric cryptography, due to its performance against the asymmetric one. The best solution is therefore to encrypt data by exploiting a symmetric cryptography algorithm, moving the problem of security towards a problem of symmetric key (DataKey) securing-hiding.
7
With regard to the DataKey securing-hiding, the key splitting algorithm is a solution that partially achieves confidentiality and integrity issues, as discussed in section 2. Insider abuses are not adequately covered by such approach, since administrators can access the key trunks. A more effective solution is required, also taking into account performance issues. For this reason, we propose to encrypt the DataKey by the authorized users public keys, applying asymmetric encryption on it. In this way only authorized users that can access the DataKey can also access data. The proposed technique combining both symmetric and asymmetric encryption allows the achievement of high information security, covering the related threats. In [7], three different categories of threats present in services are identified: Quality of Service (QoS) violation, unauthorized service access, and Denial-of-Service (DoS). With particular regards to Grid storage services, no QoS requirements are considered, thus QoS violation attacks are not possible, while unauthorized service access threats are usually faced by exploiting standard authentication and authorization techniques such as PKI. DoS threats have to be adequately addressed. In the specific case, the set of DoS threats to take into consideration has to be restricted to GS3 goals: data confidentiality and integrity. From such point of view only one DoS threat can be identified: data corruption. Other DoS and distributed DoS (DDoS) threats mainly regard networking at different levels (SYN flood attacks, routing table “poisoning”, spoofing, XML based DoS attacks, SQL injection attacks, schema “poisoning”, replay attacks, etc) and therefore are out of the scope of the paper. A good reference covering such aspects specifically in Grid environments is [7]. The combined symmetric-asymmetric encryption technique allows to fully accomplish confidentiality targets. In fact the only possible attack on confidentiality is the brute force attack, for which a 128 bits symmetric key is enough; while the symmetric key is encoded by the user public key, and the user’s keys are secured by the user. It is instead really hard to fully accomplish integrity targets in distributed environment, since administrators have to be always able to remove or modify data. A possible countermeasure to malicious operations is to make difficult to identify authorized users of data. Therefore we propose to structure data through a file system, further decomposing files in blocks stored into the Grid storage, encrypting the file system structure. In this way only authorized users can access the file system structure, increasing data integrity. However, malicious administrators can always delete or modify blocks, but without having any information about data there stored. The GS3 availability is instead totally demanded to the Grid, that has to provide a reliable storage service implementing adequate data redundancy at different levels (RAID disks, replica management, etc.). The main goal of the proposed technique is the achievement of high information security. With this aim, in this section we provide a logical description of the GS3 approach mainly addressing the security goal. Some other aspects like data sharing are taken into consideration but always from the security perspective. With regards to data sharing, data consistency, access control management, X.509 certificates and rights management issues have to be adequately addressed. In this first version of GS3 all the users accessing and sharing data have the same rights of executing, reading and modifying them. In other words, the provided solution exclusively covers information security issues, also with regards to data sharing, and can be considered as the starting point for implementing a more complex and complete Grid data management framework. That being stated, in the following subsection 3.1 we describe the security algorithm, while subsection 3.2 details the architecture that puts into practice such algorithm.
8
3.1 Logical Security Architecture
Unswappable Memory K
X.509 USER i
i
K PRIV
GS3-LIB
ith User Interface
GS3 I/O
K1PUB(K) KjPUB(K) KiPUB(K) KnPUB(K)
Grid Storage
Fig. 2 GS3 logical security architecture.
As stated before, GS3 combines both symmetric and asymmetric cryptography into a hierarchical approach, ensuring high security. A logical architecture of such approach is depicted in Fig. 2. The ith authorized user, authenticated by his/her own X509 certificate through the user interface, contacts the Grid storage system where his/her data are located. Data in the Grid storage are encrypted by a symmetric cryptography algorithm whose symmetric DataKey K is stored in the Grid storage itself, in its turn encrypted by the ith user i i public key KPUB , obtaining the encrypted DataKey KPUB (K). In this way, only the user that i has the matching private key KPRIV can decrypt the symmetric DataKey and therefore the i encrypted data. The encrypted DataKey KPUB (K) is stored together with the data in order to allow the owner to access data from any node of the Grid infrastructure. He/she only needs the smartcard containing the private key. In order to implement data sharing, the DataKey K is saved into the Grid storage, replicated into as many copies as the users authorized to access data are. The key-ring depicted in Fig. 2 wishes to pictorially represent this fact. In the proposed algorithm, the decryption is exclusively performed into the authorized users’ nodes hosting the corresponding X509 certificates. Once decrypted, the DataKey, the data and related information are kept into unswappable memory locations of such nodes to avoid malicious accesses. In this way the highest security is achieved and ensured: data and keys are always encrypted when they are far from the user, both in the remote storage and while being transferred; they are in clear only when the trusted authorized user host is reached, always and exclusively kept into an unswappable page memory of the user space.
3.2 Algorithm From an algorithmic point of view, the logical security architecture just described can be decomposed into two steps: 1) the symmetric DataKey K is encrypted through the user public key KPUB , and written in the GRID storage; then 2) K is ready to be used for data encryption. The algorithm implementing such mechanism can be better decomposed into four sub-activities: initialization, data sharing, data I/0 and termination, detailed in the following subsections.
9
User Interface
GRID Storage
Login(User) Request(Kpub_i(K))
Retrieve(Kpub_i(K))
Receive (Kpub_i(K))
Send(Kpub_i(K))
[Kpub_i(K)!=NULL]
K=Decrypt(Kpub_i(K),Kpriv_i)
[Kpub_i(K)==NULL] [New Storage ]
Generate(K) Kpub_i(K)=Encrypt(K,Kpub_i) Send(Kpub_i(K))
[Forbidden access ]
Receive &Store(Kpub_i(K))
Fig. 3 GS3 initialization phase algorithm.
3.2.1 Initialization The first phase of the GS3 algorithm is devoted to the initial setting of the distributed environment. The step by step algorithm describing the initialization phase is reported in form of activity diagram in Fig. 3. Once the ith user logs in the Grid environment through the user interface, the GS3 algorithm requests to the Grid storage system the symmetric DataKey K encrypted by the i public key KPUB of the user. If the Grid storage has been already initialized, and the user is i authorized to access the data, its answer contains the encrypted DataKey KPUB (K), that is th i decrypted by the i user private key KPRIV and then saved in a safe memory location into the user interface. Otherwise, two options are possible for the user accessing the Grid storage: the data have not yet initialized, therefore a new DataKey K must be created by the user interface side of the algorithm and therefore encrypted and sent to the other side; otherwise, if the user does not want to create a new data storage, the connection with the Grid storage element is closed. This latter case often means that the ith user is not authorized or does not have the rigth credential to access the data, i.e. the copy of the DataKey encrypted by the i user public key (KPUB (K)) is not stored or does not exist in the SE. 3.2.2 Data Sharing As introduced and discussed above, in order to access to a specific data set, the generic jth user needs the copy of the DataKey K used to encrypt such data. In the GS3 system, this requirement is translated into the necessity that the storage element (SE) stores a copy of j j K encrypted by the user public key KPUB , and so KPUB (K). If the jth user has created the j data set, KPUB (K) is automatically stored into the SE at the initialization phase, as reported in Fig. 3. Otherwise it is necessary that another authorized user, the ith one, creates a copy j of K, encrypts that by the jth user public key, thus obtaining KPUB (K), and stores this latter into the Grid storage, as specified in the activity diagram of Fig. 4. After that, the ith user is able to access the data set.
10
User Interface
GRID Storage
Request(Kpub_i(K))
Retrieve (Kpub_i(K))
Receive(Kpub_i(K))
Send(Kpub_i(K))
K=Decrypt(Kpub_i(K),Kpriv_i) Kpub_j(K)=Encrypt (K,Kpub_j)
Send(Kpub_j(K))
Receive &Store(Kpub_j(k))
Recv(R)
Send(R)
Fig. 4 GS3 sharing request processing.
3.2.3 Data I/O
User Interface
GRID Storage
Request(Data)
Retrieve(K(Data))
Recv(K(Data))
Send(K(Data))
User Interface
GRID Storage
Encrypt(Data,K)
User GRID Storage Interface Send(OPReq)
Send(K(Data))
[NO]
Recv(OPReq)
Recv(K(Data)) R=Process(OPReq) R=Store(K(Data))
[OK]
Error(Read)
Recv(R)
Recv(R)
Send(R)
Send(R) Decrypt(K(Data),K)
(a)
(b)
(c)
Fig. 5 GS3 data I/O primitives algorithm: read (a), write (b) and generic ops (c).
GS3 organizes the data stored in the Grid storage through a file system structured in directories. The data are managed and accessed by using the well-known I/O primitives open, close, read, write, unlink, access, chmod, closedir, create, lseek, lstat, mkdir, opendir, readdir, rename, rmdir, stat and unlink. In Fig. 5, the algorithms implementing read, write and the other generic operations (unlink, rename, create, lseek, etc.) are represented by activity diagrams. In particular the read algorithm of Fig. 5(a) implies the decryption of data received by the Grid storage, while the write algorithm of Fig. 5(b) requires the encryption of data before they are sent to the storage system. A generic operation instead only sends a command or a signal, as shown in Fig. 5(c). 3.2.4 Termination The termination phase algorithm is described by the activity diagram of Fig. 6. Before the user logouts the Grid, it is necessary to remove the symmetric DataKey and the other re-
11
User Interface_i
GRID Storage
Send(ENDREQ)
Recv(ENDREQ)
Recv(R)
R=CheckStatus
[NO]
[OK]
Send(R)
FlushMem(User_i) Logout(User_i)
Fig. 6 GS3 termination phase algorithm.
served information from the user interface memory. But, since a user could still have one or more data I/O operations alive, it is possible he/she wants to know the status of such operations, asking to the Grid storage system about that. Then, by evaluating the obtained answer, he/she can choose to terminate the current session or to wait for the completion of operations. Finally the user logouts the Grid.
4 GS3 Implementation over gLite The idea of combining symmetric and asymmetric cryptography in the data security algorithm detailed in section 3, has been implemented as a service in the gLite Grid middleware. In order to describe such implementation, in subsection 4.1, we introduce constraints and requirements motivating the implementation choices, and then we detail the storage architecture (subsection 4.2) and the designed library (subsection 4.3).
4.1 Requirements and Specifications Since the GS3 implementation must be integrated in the gLite environment which uses its own storage libraries (GFAL), the best solution available to simplify the use of the Grid secure storage and to better integrate such implementation into the gLite middleware is to implement an encrypted file system on top of GFAL. In order to ensure high security it is also necessary that the secure storage service must be available in interactive mode from the UI, that exclusively performs data decryption. Moreover, for data sharing purposes, it is also necessary to define a specific structure for storing the encrypted DataKey copies related to the corresponding authorized users. In such implementation, we choose the AES [48] algorithm for symmetric encryptions, and the public key infrastructure (PKI) [42] for asymmetric cryptography. Moreover, for the sake of simplicity and portability towards other paradigms, a POSIX interface has been implemented.
12
4.2 Storage Architecture
UI gLite GS3
SE
UI
GFAL
K(BLK[1])
FILE INDEX GS3FI Unswappable Mem GS3 FILE
K(BLK[n]) K1PUB(K)
FILE BLOCKS CACHE GS3FBC
(a)
SE
K(GS3FI)
KuPUB(K)
(b)
Fig. 7 GS3 gLite Implementation Architecture (a) and File System (b).
The architecture implementing the GS3 algorithm in the gLite middleware, satisfying the requirements and specifications above described, is depicted in Fig. 7(a). The GS3 storage service creates a virtual file system structuring the data in files, directories and subdirectories without any restrictions on levels and number of files per directory. Since we build this architecture on top of GFAL, in GS3 all data objects are seen as files stored on the SE, accessible by users through the GFAL interface (LFN, SRM, GUID, etc). Thanks to the storage architecture and the internal organization, the GS3 implementation provides all the benefits of a file system. One of the most interesting is the capability of file modification and/or rewriting, operation not implemented by the GFAL library. GFAL only allows to create new files and to delete the existing one, without any possibilities of modification once they have been created. A GS3 file can be entirely stored in the SE in one chunk of variable length or it can be split into several blocks with fixed, user defined length specified in the GS3 setup configuration, as reported in Fig. 7(b). The blocks are encrypted independently in order to achieve better performance, since, as discussed in section 3, all the users accessing data are considered trust and have the same rights. In this way replay attacks in which authorized group members could exchange a block with other one do not make sense, since the selection of group members avoids this possibility. To avoid conflicts among file names, we univocally identify each chunk of data stored on the SE by a GUID identifier. The file index (GS3FI) shown in Fig. 7(b), maps a file to the corresponding blocks in the SE. Such file index is encrypted through the symmetric DataKey and is kept in UI unswappable memory locations. In this way the user operates on a virtual file system whose logical structure usually does not correspond with its physical structure in the SE, since each file can be split into many blocks stored in the SE as files. But the main goal of file indexing is the optimization of the file I/O operations, since it reduces the data access time. Moreover, since the GS3 file rewriting and modification on the SE has to be implemented through GFAL primitives, these operations are performed by deleting the file and rewriting its modified version; to split a GS3 file into several files in the SE is the only feasible way to reach the goal. In order to implement data sharing, it is necessary, as discussed above, that all the u authorized users have a copy of the Datakey encrypted by the i with i = 1, .., u) stored in the Grid storage. corresponding public key (KPUB
13
The file system is created and stored on the SE when the GS3 initialization is performed. Each file referring to data stored on the SE is encrypted by a symmetric DataKey stored on the same SE and encrypted by the user public key. In order to optimize the performance of the file I/O operations, a local cache of encrypted chunks (GS3FBC) is held in the UI unswappable memory. All the operations involving chunks already loaded in the UI cache are performed locally, varying the content of such chunks. When a file is closed, the blocks stored in the cache are updated to the SE. A specific GS3 primitive (gs3 flush) has been provided to force the flushing of data from the UI cache to the SE storage. This remarkably speeds-up the performance of the storage system, reducing the number of accesses to the SE. Problems of cache coherence may arise if there are more than one simultaneously active access on the Grid storage working on the same data, i.e. in case of data sharing as stated in section 3. We choose to face the problem by implementing a lazy consistency semantics, as also done in NFS [44], allowing to have different unsynchronized copies of the same data on local caches. More specifically, we apply a relaxed consistency protocol, leaving the problem of data synchronization to the users accessing such data simultaneously. 4.3 GS3 Interface Library and API Since the library commands implement a POSIX.1 interface, the access to a file on the virtual encrypted file system is similar to the access to a local file. GS3 specifies the same library functions set of GFAL: in the former case the functions are prefixed by ‘‘gs3 *’’ while in the latter case by ‘‘gfal *’’. The main difference between GS3 and a POSIX interface is constituted by the initialization and the termination phases as described in section 3.2. In the following we specify the GS3 primitives starting from the same phases characterization identified above. 4.3.1 Initialization The initialization phase is the most important phase of the GS3 gLite implementation. In this phase the library context must be initialized by the user setting the specific GS3 environment variables: GS3 PATH (URL of the device where data files are stored), GS3 PUBKEY (user’s public key used to encrypt) and GS3 PRVKEY (user’s private key used to decrypt). In case both GS3 PUBKEY and GS3 PRVKEY are left unspecified, by default the GS3 system try to detect and discover the smart-card device in the UI for performing asynchronous cryptography operations. If no smart-card device has been plugged in the UI and the GS3 environment variables are wrongly or not at-all initialized, the GS3 management system signals the error. Even though the smart-card identification is optional for the GS3 algorithm, the highest security standard is achieved by smart-card identifications since the keys stored locally the UI are exposed to security attacks. A generic user needing to access data stored into a Grid storage system has to invoke the gs3 init(const char *path) function from his/her UI in order to read the symi metric DataKey K encrypted by the user public key KPUB from the SE, specifying its URL, URLK , in the path argument. As shown in Fig. 8 and also introduced in subsection 3.2.1, two cases distinguish the first from the successive accesses. In both cases a gs3 init ini (K) verifying if the corresponding file vocation submits a query to the SE searching KPUB exists. Such check is implemented through a gfal stat(const char *filename, struct stat *statbuf) primitive which use the URLK parameter specified in the
14
FILE BLOCKS CACHE GS3FBC
Res=gfal_stat(URLK,&ST)
Res
KN , Unswappable Mem Res=gfal_write(FD &KPUB(KNEW ),C) FILE Res=gfal_close(FDKN ) INDEX KNEW GS3FI FDFI =gfal_open(URLFI,
UI
SE
FDKN =gfal_open( URLKN , O_CREAT,400)
O_CREAT,600)
Ki PUB(KNEW) NEW
K
(GS3FI)
(a) FILE BLOCKS CACHE GS3FBC
Res=gfal_stat(URLK,&ST)
ST
FDK=gfal_open( URLK,O_RDONLY,0)
Unswappable Mem FDK,&KPUB (K),C) FILE Res=gfal_close(FDK) K INDEX FDFI=gfal_open( GS3FI URLFI,O_RDONLY,0) Res
UI
Res=gfal_read(
Res=gfal_read(FDFI, &K(GS3FI),C)
K(GS3FI) K1PUB(K) i K PUB(K) u K PUB(K) K(BLK[n]) K(BLK[1])
SE
(b) Fig. 8 The gs3 init library initialization primitive: first use (a) and following uses (b).
gs3 init call as filename path, and returns a buffer statbuf containing the information about a file (ST) and the exit code (Res). If the query generates a file miss or, in other i words, the KPUB (K) file corresponding to the specified URL does not exist, gfal stat returns an error code corresponding to the file not found condition. This case corresponds to the first initialization phase, in which it is necessary to initialize the SE for storing GS3 data. Thus, the gs3 init algorithm has to generate a new symmetric Datakey K NEW as a sequence of random numbers by executing an OPENSSL function, then encrypted through the user private key and therefore sent to the SE. In order i the SE stores the new encrypted Datakey KPUB (K NEW ), it is necessary to create the corresponding file by invoking a int fd=gfal open (const char *filename, int flags, mode t mode), which open the file specified in the filename path string, with mode (read, write, create, append, etc) specified in flags returning the file descriptor fd. mode must be specified when O CREAT (create) is in the flags, and it is ignored otherwise. In the specific case the file path is specified by the URLKN and, since a new file must be created, flags corresponds to O CREAT and data can be only read by the owner (400 mode). Then the key can be stored into the SE by a gfal write(int fd, void *buf, size t size) operation, specifying the file descriptor fd (the FDKN rei turned by the previous gfal open), the output buffer address buf (KPUB (K NEW )), and the amount of data to read (C) in the size parameter, as depicted in Fig. 8(a). Once stored into the SE, the encrypted key file can be closed by a gfal close(FDKN ) operation. In the same way, the GS3 file index (GS3FI) is created, by invoking a gfal open returning the corresponding file descriptor FDFI , but in this case such file must be rewritable by the owner (mode=600).
15 i Otherwise, in case the SE already stores KPUB (K), the GS3 storage has been already i initialized. Thus, as shown in Fig. 8(b), gs3 init loads KPUB (K) and the encrypted file index K(GS3FI) from the storage element by two consecutive sequences of gfal opengfal read operations. The two files are opened in read-only mode (O RDONLY); the gfal read primitive have the same prototype of gfal write. As above, once the key is loaded into the UI, the SE file is closed by executing gfal close(FDK ). It is important to remark that, in both cases, the Datakey and the other sensitive data such as the GS3FI and the data blocks, are always encrypted during the communication over the network, while are placed into unswappable memory pages into the UI after decryption, to avoid malicious accesses.
4.3.2 Data Sharing
K Unsw. Mem X509j
UIi
j
K PUB(K)
FDKj=gfal_open( URLKj,O_CREAT,440)
K(GS3FI) i
Res=gfal_write( FDK j,&KjPUB (K),C)
Res
K PUB(K) Kj PUB(K) K(BLK[n]) K(BLK[1])
SE
Fig. 9 The gs3 share data sharing primitive.
The data sharing is implemented in GS3 by the gs3 share primitive. A user i that wants to share the data stored into an SE with a generic jth user, has to invoke gs3 share(const char *pubkey) from his/her UI (UIi ) after initializing the GS3 system, specifying the X509 PKI certificate of user j as parameter (pubkey=X509 j ). The gs3 share algorithm j ) and then creates a new copy of the DataKey K. As extracts the public key of user j (KPUB reported in Fig. 9, gs3 share then encrypts such copy of K by the jth user public key, and j (K) into the SE by firstly invoking a gfal open to create the file into therefore stores KPUB the SE, and then writing the encrypted key through a gfal write invocation. Thus, the j (K). jth user can have access to the data calling the gs3 init primitive that loads KPUB 4.3.3 Data I/O GS3 data I/O operations are implemented through POSIX primitives. Table 1 reports all the I/O primitives implemented in GS3 . Files are always encrypted in memory, the encryption is performed at runtime. To improve the GS3 performance and the usability of its library, the accessed files’ chunks are locally buffered into a local file blocks cache (GS3FBC) in the UI until the corresponding files are closed. At file closing, the UI GS3FBC must be synchronized with the SE. More specifically, the gs3 read(int fd, void *buf, int c) primitive reads c bytes of data of the file referred by the fd file descriptor placing that in the local UI buffer buf. As pictorially described in Fig. 10(a), the gs3 read starts by querying the required file descriptor fd to the GS3 file index, obtaining a blocks set (BLK1 [ ]) in which the GS3
16 Unswappable Mem FILE BLK2[] BLOCKS CACHE GS3FBC
C1[] FD
FDBLK2 []=gfal_open( URLBLK2 [],O_RDWR,0)
C2[]
i
K PUB(K)
BLK1[] C
UI
Res=gfal_read( FDBLK2 [],*BUF,C2[])
FILE INDEX GS3FI
K(GS3FI)
K(BLK[n]) K(BLK[1])
SE
BUF Res
(a) Unswappable Mem C1[] FD
C
FILE BLOCKS CACHE GS3FBC
K(GS3FI) KiPUB(K)
K(BLKD1[]) FILE INDEX GS3FI
UI
K(BLK[n]) K(BLK[1])
SE
BUF
(b) FILE INDEX GS3FI
Res=gfal_(,)
K(GS3FI) K(BLK[])
i
K PUB(K) Res=gs3_flush(URLBLK[])
FD
Res
UI
FILE BLOCKS CACHE GS3FBC
Res=gs3_flush(URLFI )
K(BLK[n]) K(BLK[1])
SE
Uns Mem
(c) Fig. 10 GS3 data I/O primitives: gs3 read (a), gs3 write (b) and gs3 (c).
file is decomposed. Such blocks are therefore searched into the local cache. The blocks not present in the GS3FBC, identified by the set BLK2 [ ] ⊆ BLK1 [ ], must be loaded from the SE by invoking a sequence of gfal open-gfal read operations for each of them, represented in Fig. 10(a) by a cumulative operation specified on the whole array BLK2 [ ]. Generally such blocks are opened in read/write mode (O RDWR). The blocks thus transferred from the SE are placed in the UI output buffer BUF and then loaded into the UI GS3 cache. The gs3 write(int fd, const void *buf, int c) primitive has the same parameters of gs3 read, with the only obviuos difference that buf is an input parameter. This is an operation entirely performed locally to the UI, as shown in Fig. 10(b). The data blocks to be modified in the SE are temporarily saved into the file blocks cache. When the file is closed, renamed, moved, deleted, the flush of the cache is forced, or the gLite GS3 session is terminated, the data in the GS3FBC are synchronized with the corresponding one in the SE. gs3 (, ) is a generic data I/O operation mapped into the corresponding GFAL operation gfal (, ). This has two op-
17 GS3 I/O P RIMITIVES int gs3 access(const char *, int) int gs3 chmod(const char *, mode t) int gs3 close(int) int gs3 closedir(DIR *) int gs3 create(const char *, mode t) int gs3 errmsg(char *, int, const char *) off t gs3 lseek(int, off t, int) int gs3 lstat(const char *, struct stat *) int gs3 mkdir(const char *, mode t) int gs3 open(const char *, int, mode t) DIR *gs3 opendir(const char *) ssize t gs3 read(int, void *, size t) struct dirent *gs3 readdir(DIR *) int gs3 rename(const char *, const char *) int gs3 rmdir(const char *) int gs3 stat(const char *, struct stat *) int gs3 unlink(const char *) ssize t gs3 write(int, const void *, size t) int gs3 fsync(int) int gs3 sync() Table 1 GS3 I/O primitives headers.
tional parameters: the former, file id, identifies the file through a URL or a descriptor, if required by the operation, the latter is a sequence of parameters, also depending on the type of operation. If a gs3 modifies the file system structure (unlink, create, rename, rmdir, etc), it is necessary to update both the corresponding blocks and the file index in the SE by forcing gs3 flush operations. Since this does not occur in every gs3 call, the gs3 flush invocations and the block to which operate are represented in Fig. 10(b) by dashed lines corresponding to optional operations. 4.3.4 Termination The main goal of the termination operation is the synchronization of data between the UI cache and the SE. This is implemented in GS3 by the gs3 finalize function. Since gs3 finalize always involves a gs3 flush, we decide to detail such primitive here. The gs3 flush(const char *filename) primitive specifies in filename the URL of the block to be synchronized in the SE (URLBLK ). It is therefore invoked to force the flush of a generic block of data, as also the GS3FI ones, from the UI to the SE. It is an internal operation, invoked by the GS3 system when executing primitives such as: gs3 , gs3 sync, gs3 fsync and gs3 finalize. Fig. 11(a) shows the implementation of gs3 flush. The first action performed by such algorithm is to verify wether the block to flush has been already created into the SE or it exists only in the UI, through a gfal stat call. In case it has been already created, since the GFAL libraries do not implements any file modifications and/or rewriting capability, the corresponding file in the SE has to be deleted by invoking the gfal unlink(const char *filename) primitive specifying the block URL (URLBLK ) as filename. Then, in any case, a new file is created by a gfal open invocation with O CREAT flags allowing read/write to both the owner and the group (660 mode) since, anyway, the data are always encrypted. Therefore the block is written into the SE by a gfal write. The dashed line
18
FILE BLOCKS CACHE GS3FBC
Res=gfal_stat(URLBLK,&ST)
K(BLK) Res
FILE INDEX GS3FI Uns Mem
UI
Res=gfal_unlink(URLBLK) FDBLK=gfal_open( URLBLK,O_CREAT,660) Res=gfal_write(FDBLK, &K(BLK),C)
K(GS3FI) KiPUB(K) K(BLK[n]) K(BLK[1])
SE
(a) FILE BLOCKS CACHE GS3FBC
K(GS3FI)
K(BLK1[]) Res=gs3_flush(URLBLK1[])
Res FILE INDEX GS3FI Uns Mem
Res=gs3_flush(URLFI)
UI
KiPUB(K) K(BLK[n]) K(BLK[1])
SE
(b) FILE BLOCKS CACHE GS3FBC
K(BLK1[])
Res=gs3_flush(URLBLK1[])
KiPUB(K)
Res=gs3_flush(URLFI)
K(BLK[n]) K(BLK[1])
Res FILE INDEX GS3FI Uns Mem
UI
K(GS3FI)
Res=gfal_close(FDBLK1[])
Res=gfal_close(FDFI)
SE
(c) Fig. 11 The gs3 flush (a) gs3 fsync/gs3 sync (b) and gs3 finalize (c) primitives.
of Fig. 11(a) represents the fact that the gfal unlink operation is performed only if the file has been already created. gs3 fsync(int fd) and gs3 sync() are the ways available for a user to force the flush of data locally stored in the UI to the SE. The former forces the flush of the GS3 file specified in the fd parameter, while the latter of the whole GS3FS. As shown in Fig. 11(b), both the sync primitives have the same algorithm: the blocks modified (BLK1 [ ]) are firstly searched among all the blocks or among the ones referring to the specified GS3 file, in case of gs3 sync or gs3 fsync, respectively. Then the algorithm invokes as many consecutive gs3 flush as the modified blocks of BLK1 [ ] plus one for the GS3FI. The gs3 finalize() implementation, shown in Fig. 11(c), is similar to the gs3 sync one. It starts by checking the modified blocks stored into the GS3FBC. The SE files corresponding to the modified blocks (BLK1 [ ]) are updated by invoking a gs3 flush for each of them, therefore closed. Finally, also the GS3FI is updated into the SE through a gs3 flush call, and the corresponding file closed.
19
5 GS3 File System As introduced in section 2.2, a widely spread technique to implement Grid data storage is to organize data into distributed file systems. Most of the referred solutions are build on top of a client-server architecture outside the middleware, that can be already existing, as in file systems extending NFS or other distributed FS, or a new one, specifically conceived by the proposed solution. The most direct solution for implementing a secure Grid file system is to use an encrypted filesystem over a distributed filesystem. Namely systems like dm-crypt [14], TrueCrypt [57] or eCryptfs [41] can access the payload over (e.g.) NFS or AFS, which would provide similar functionality. In this way, user-file system interactions are remotely processed by the file system server, i.e. it is necessary that a file system module (NFS, AFS, GPFS, LUSTRE, custom defined FS, etc.) is always present and alive on the remote storage. Such approach usually provides a trade-off among several challenges: performance, file sharing, authentication, authorization, access control management, etc. But, if the aim to carry out is more specifically restricted to a subset of such issues, the guaranties provided by such a generic solution usually cannot satisfy the settled requirements. With particular regard to security, to ensure high confidentiality and integrity of data and information in widely distributed environments such as Grid, it is necessary to conceive a specific solution focused on such aspects, instead of adapting or using generic client-server solutions. In fact, in terms of security a server can introduce further vulnerabilities to the system. Moreover, another drawback of this approach is that, in order to implement it, modifications to the Grid middleware are required. A more feasible and secure file system architecture is necessary for providing high information security to users accessing wide-Grid distributed environments, for example by introducing data encryption as done in GS3 . Therefore, in order to enhance the usability of the Grid storage system, also taking into account the considerations made above, we implement the GS3 technique in form of a specific, encrypted file system, named GS3FS, ensuring a totally secure storage space to the users. GS3FS is implemented on top of the gLite middleware, which do not require any modification. In this way, the encryption mechanism and the remote access to the Grid storage elements are totally transparent to users. This allows to conjugate the high security of the GS3 algorithm described in section 3 with the user friendliness of a POSIX file system. More specifically, the implementation of the GS3FS is based on FUSE as shown in Fig. 12. FUSE is a kernel module for Unix-like operating systems, that allows non-privileged users to create their own file systems without requiring any modification to the kernel code. This is achieved by running the file system code in the user space, while the FUSE module only acts as a “bridge”, an interface, between the user and the kernel space. Since only the user is involved in such process, this ensures an high level of security. Our work is oriented and focused on the user; from this point of view we implement data sharing, by sharing the whole file system namespace that is also encrypted and therefore not accessible by unauthorized users. From the user point of view, the GS3FS is thus considered and managed as a local file system. By means of the FUSE interface, a user can mount a remote storage space, located on a Grid storage element, as a local device mounted into the local file system, as shown in Fig. 12(a). The interface to GS3 files and directories is totally transparent to the user, the same used for accessing local files or directories. The only difference is in terms of response times, since GS3 operations are usually performed remotely. Moreover, as described in section 4, security features (confidentiality, integrity) have been implemented in GS3FS through cryptography mechanisms, in a transparent way for the users that keep on accessing data in clear.
20
User Interface Local File System
GS3FS
/
Storage Element
GS3 Mount Point
User
(a)
User Interface
GFAL gs3−api
user space
ls
libfuse
VFS
FUSE
Storage Element
kernel
(b) Fig. 12 GS3FS user access (a) and user architecture (b).
Always referring to Fig. 12(a), a user interacts with the GS3FS by mounting it in the local file system (at the GS3 mount point). Thus, the GS3FS appears as a common directory containing its files and subdirectories. The GS3 file I/O requests made by the user, are directly managed and processed by the GS3FS that retrieves the data and gives back the results. Fig. 12(b) better details such process: a generic user command (ls in the example) is notified to the operating system and processed by the virtual file system (VFS) operating in kernel mode. This latter forwards such request to the FUSE module, that in its turn redirects Linux VFS calls into the user space. In this way the request, through the specific FUSE library (libfuse), reaches the GS3FS user library (GS3-API), that reformulates them as GFAL I/O operations remotely performed into the storage element. The results of such operation are then sent back to the user, following the reverse path. Through FUSE, a user can use the same interface provided by the file systems supported by his/her unix-like operating system and system calls for accessing both local or remoteGS3 files and file systems. In this way, FUSE introduces flexibility also in the execution of binary applications over the Grid storage element. Commands such as vi, emacs, gcc, etc., can be used without recompiling them, and without modifying or applying any patch to the operating system kernel, since the system calls are redirected by the FUSE kernel module.
6 Performance In order to evaluate the performance of the GS3 gLite implementation described, different tests have been executed. In the first of them, the performance of single write, read and
21
delete operations have been evaluated, as described in subsection 6.1. The other tests aim at evaluating real applications such as Gcc and SQLite in order to give a better picture of what end users could get, as reported in subsection 6.2.
6.1 Operational Tests Tests on single GS3 operations have been implemented by varying the file size, starting from 28 bytes and doubling it in each experiment up to 217 bytes. Therefore in total we made 10 different tests. More specifically, in such tests the performance of the GS3 primitives, with particular attention on write, read and delete operations, are evaluated. The results thus obtained are compared vs the one obtained by evaluating the corresponding GFAL primitives. Moreover, we also compared them against the performance of an enhanced version of GFAL in which we only added the encryption feature to the original primitives (we call it CGFAL). We also made the same measures on the local file system (LOCAL). As performance metric we considered the execution time, i.e. the time elapsed from the operation’s launching until the results are fed back to the user. Thus, we measured the execution time of GS3 write, read and delete primitives by only varying the file size. We have executed the experiments with the purpose of testing the behavior of the GS3 primitives independently from the user application; of course such behavior is strongly affected by the cache whose hit ratio depends on how the data are generated, that is from the specific application. As a consequence, we have de-activated the cache during our experiments, in order to ensure general results. To put in practice such requirements and specifications, in the tests a file has been firstly created/written and then read and deleted. Since such operations are always performed on new files, not opened before, the GS3 cache is never used or activated. This could be considered as the evaluation of the worst case for the GS3 primitives, in which operations are directly performed into the SE, without taking into account the benefits of the cache. In order to provide useful measures, we have repeated each test 1000 times, calculating the average value of the results thus obtained. The graphs reported in the following plot the corresponding time measure by varying the file size in a logarithmic scale. 6.1.1 Write The results obtained by evaluating the elaboration time of the write calls are reported in Fig. 13. Such results show similar trends for all the considered tests. As can be easily expected, the elaboration time of a write operation is strongly affected by the file size: by increasing it the elaboration time proportionally increases. Since the slope gradient depends on data transferring and storing times, it particularly impacts the GFAL, CGFAL and GS3 tests, where data transfer occurs between remotely networked hosts (UI-SE). Another characteristic in common to the trends of these latter tests is the presence of a fixed time gap. Starting from this consideration, we can identify and separate two components into the GFAL, CGFAL and GS3 graphs shown in Fig. 13: the former, variable with the file size, depends upon transfer and storing times, as discussed above; the latter is constant and does not depend on data size. It is a fixed delay to pay in order to access data by the GFAL interface, spent for opening a file by means of the GFAL API. By comparing the GFAL, CGFAL and GS3 behaviors, we can observe that, without considering the impact of the cache, GS3 is considerably slower than GFAL, CGFAL and
22
gfal gs3 local cgfal
40 35 30 Time(sec)
25
20 15 10 5 0 10-1
100
101 Size(Mb)
102
103
Fig. 13 Performance comparison of the GS3 , GFAL, CGFAL and local write operations.
obviously than LOCAL calls. This is due to the fact that, each time a GS3 write is performed, it is also necessary to update the file index stored into the SE, and therefore two consecutive gfal write are needed, as shown in Fig. 11. Another interesting consideration we can do by observing the performance of the encrypted CGFAL and GFAL write operation is that the time spent to access the communication network is orders of magnitude greater than the computational time spent for encrypting data. This justifies the performance gap among GS3 and the other storage systems: in the former case two network storage accesses are required, and therefore two consecutive GFAL open operations, the first for storing data, the second for storing the file index, while in the other cases only one access is needed. This is the cost of the file index table, whose impact on performance in combination with the cache is not evaluated here. However, the presence of the file index table is very important, also without a cache, since it allows to implement the file modification/rewriting capability.
6.1.2 Read The performance obtained by the read tests are shown in Fig. 14. Also in this case two different contributions can be identified: the file size, affecting the read operations in all the considered tests, especially where data transfers involve the network; and a constant delay, present in gLite/GFAL environments (GFAL, CGFAL, GS3 ), due to opening the file and accessing the data by GFAL. But, in this case, the results of the gs3 read elaboration without taking into consideration the impact of the cache are comparable with the gfal read ones and also with those obtained by the CGFAL. This is due to the fact that gs3 read operations do not have to update the file index, and therefore they only make one GFAL open/access operation, as in GFAL and CGFAL. The trends shown in Fig. 14 also confirm that the time spent in the encryption tasks is negligible with regard to the time spent in communication.
23
gfal gs3 local cgfal
20
Time(sec)
15 10 5 0 10-1
100
101 Size(Mb)
102
103
Fig. 14 Performance comparison of the GS3 , GFAL, CGFAL and LOCAL read operations.
gfal gs3 local cgfal
12
Time(sec)
10 8 6 4 2 0 10-1
100
101 Size(Mb)
102
103
Fig. 15 Performance comparison of the GS3 , GFAL, CGFAL and LOCAL delete operations.
6.1.3 Delete Fig. 15 reports the results obtained by evaluating the delete operations. Unlike read and write operations, the trends of local and gLite/GFAL-based delete operations differ. With regard to the former, the behavior of the local delete depends on the file size, as for the read and write operations. While, in case of remote gLite/GFAL-based delete operations, it is necessary to open the file and then to send few bytes of the delete signal. Since such behavior is usually asynchronous, the sender does not need to wait for the delete completion. This is the reason why the performance of the GS3 , GFAL and CGFAL delete operations do not depend on the file size, but a constant delay is experienced. Similarly to the write case, we can observe a great gap between the GS3 performance and the others, due to similar motivations: a GS3 file deletion, as shown in Fig. 10(c), needs to update the SE file index after removing the file from the SE. This introduces a further GFAL file open and an extra gfal write operation of such file index, increasing the overall elaboration time.
24
6.2 Real Applications Tests
Real Applications Usage 1000 Local GS3 GS3 No Cache 100
Time(sec)
10
1
0.1
0.01
0.001
ite
l Sq
ite
l Sq
ite
l Sq
cc
G
E ET
EL
D
E
AT
T
R SE
PD
U
IN
Fig. 16 Performance comparison of Gcc compilation and SQLite operations of GS3 against local.
To test the GS3 behavior on real cases, we select two heavy disk consuming-applications: Gcc (GNU compiler) and SQLite (on-file data base). In particular, in the former case the compilation of a few KB files through Gcc has been observed, while in the latter case SQLite insert, update and delete queries have been performed and evaluated. The performance obtained by using the full GS3 implementation, have been compared against the performance of a modified GS3 version, in which the cache has been disabled, and also against the performance obtained by local executions. In this way a (partial) evaluation of the cache impact on GS3 is provided, even thought in the specific conditions above specified. The results thus obtained are shown in Fig. 16, where the execution time of the different operations is reported in the Y axis in logarithmic scale, while the tests series, represented as histograms, are characterized in the X axis. The tests have been performed on the same testbed of the previous analysis, and also repeated 1000 times. The obtained results confirm the previous ones, in which we observed that a GS3 write operation is slow, mainly due to the network performance and the double delete-write operations overhead for implementing a rewriting. However, such graphs demonstrate that GS3 really works. Moreover, the comparison among the results obtained by the full GS3 implementation against the version without caching also demonstrates the effectiveness of GS3 caching mechanism. This gives us confidence and motivation to further develop our work on GS3 .
25
7 GS3 Application Scenarios As possible application scenarios for demonstrating the real effectiveness and usefulness of GS3 , we figure out the storage of huge amount of data with the necessity of differentiating among such data according to security peculiarities, the so called information classification. Classified information are sensitive information to which access is restricted by law or regulation to particular classes of persons. A formal security clearance is required to handle classified documents or access classified data. The clearance process requires a satisfactory background investigation. There are typically several levels of sensitivity, with different clearance requirements. The US government in [56] specifies that information may be classified at one of the following three levels: top secret, secret and confidential. To the above classification it is necessary to add a further unclassified level, that is the default and refers to information that can be released to individuals without a clearance. Similar classifications have been specified by other governative and non-governative subjects such as NATO, ONU, EU, private corporations, etc. One of the most interesting and widely spread is the traffic light protocol [9], characterizing with colors the security levels: red identifies personal information (top-secret); amber means limited distribution of data within the organization, but only on a ‘need-to-know’ basis (secret); green characterizes data that can be circulated widely within a given community, but that may not be published or posted on the Internet, nor released outside of the community (confidential); and finally white identifies unlimited, unclassified, public data, only subject to standard copyright rules. Also the International Standard Organization worked on security topics, specifying the ISO/IEC 27000-family of information security standards commonly known as ISO27k. In particular, ISO/IEC 27002 [31] is an internationally-accepted standard of good practice for information security. Tens or hundreds of thousands of organizations worldwide follow ISO/IEC 27002. With regard to information classification, five levels of classification are identified by ISO/IEC 27002:
– Top Secret - Highly sensitive internal documents and data. For example, impending mergers or acquisitions, investment strategies, plans or designs that could seriously damage the organization if lost or made public. Information classified as Top Secret has very restricted distribution indeed, and must be protected at all times. Security at this level is the highest possible. – Highly Confidential - Information which is considered critical to the organization’s ongoing operations and could seriously impede or disrupt them if shared internally or made public. Such information includes accounting information, business plans, sensitive information of customers of banks (etc), patients’ medical records, and similar highly sensitive data. Such information should not be copied or removed from the organizations operational control without specific authority. Security should be very high. – Proprietary - Procedures, project plans, operational work routines, designs and specifications that define the way in which the organization operates. Such information is usually for proprietary use by authorized personnel only. Security at this level is high. – Internal Use Only - Information not approved for general circulation outside the organization, where its disclosure would inconvenience the organization or management, but is unlikely to result in financial loss or serious damage to credibility/reputation. Examples include: internal memos, internal project reports, minutes of meetings. Security at this level is controlled but normal.
26
– Public Documents - Information in the public domain: press statements, annual reports, etc. which have been approved for public use or distribution. Security at this level is minimal. Since the classifications above discussed refer to any type of information, these can also be applied for discriminating among digital data. Our goal is to implement an adequate computing-storage system that puts into practice such classifications. In Grid environment, an effective solution could be based on GS3 and on the GS3FS. In order to implement such solution exploiting GS3 , first of all it is necessary to identify as many security policies as the security levels specified. Starting from the ISO/IEC 27002 information classification, five security policies to implement by GS3 can be identified: – Top Secret (Red) - a very restricted group of trust users, till to only the owner, can exclusively access encrypted data. To ensure such exclusive data access, in the storage element as many copies of the DataKey as the allowed users are stored, encrypted by the corresponding private key, together with the data. In case only the owner is allowed to access data, only his/her private key encrypted DataKey is stored in the Grid. – Secret/High Confidential (Amber) - a restricted group of well-known, trusted users can access data, that also in this case are encrypted. As many copies of the Datakey used in data encryption as the authorized users must be stored in the Grid storage element, each encrypted by the corresponding user private key. – Proprietary (Brown) - a wide group of users can access data, that however are always reserved and therefore encrypted. Community wide accesses can be implemented in GS3 by creating community certificates distributed among all the users belonging to the community. Each of them must securely host the community private key locally on his/her node or user interface. The storage element stores a copy of the DataKey encrypted through the community public key. – Internal Use Only (Green) - data can circulate among the Grid virtual organization, therefore they are stored in clear, without any encryption. Anyway, in order to access them, a user has to authenticate him/herself into the virtual organization. – Unclassified/Public Docs (White) - data are public and therefore are stored in clear, not encrypted, into the Grid storage element. Moreover, no authentication is required to access them. In this way different users have different views of the whole dataset: the data masks are implemented through the presence or the absence of copies of DataKeys encrypted by the authorized users’ private key. Fig. 17 well describes the dataset organization implementing the classification above discussed into a Grid environment by applying the GS3 technique. Such implementation splits the dataset according to the information security level, generating five disjunct subsets characterized by different colors. The subsets grouping data according to their security level, are implemented in GS3 through, at least, a GS3FS each. However, it is possible that data contained into a subset must be further classified distinguishing among users’ access, and therefore they must be split and stored into more than one GS3FS. Through this characterization we want to describe how it is possible to implement the information classification scenarios above described, focusing on encryption and data security. Anyway, in order to really implement the proposed approach it is necessary to adequately solve important questions regarding access management (groups and rights management, setting, revocation, etc.) and data sharing management. This is therefore a partial solution covering security aspects.
27
UIA User A
/
GS3_lib
UIB
GS3FS TS HC PR IOU PD
User B
/ GS3FS HC PR IOU PD
GS3_lib
KAPRIV
KBPRIV
SE K(TS-GS3FI) A
K
PUB (K TS )
K(BTS[n]) K(BTS[1]) TS-GS3FS BIOU/P[n]) BIOU/P[1]) IOU-FS
UIC User C
/
K(HC-GS3FI)
KAPUB B (KPR ) K PUB(K PR)
K(BPR[n]) K(BPR[1]) PR-GS3FS
HC-GS3FS BPD/P[n]) BPD/P[1]) PD-FS
/
/
IOU PD
GS3_lib
GFAL KCOMM PRIV
COMM PUB (KPR )
K
K(BHC[n]) K(BHC[1])
UID
GS3FS PR IOU PD
K(PR-GS3FI)
A
KKPUB B (KHC) PUB(KHC)
User D
D KPRIV
PD
Unidentified User
Fig. 17 GS3 Application Scenario.
The example shown in Fig. 17 splits data into five GS3FS hosted in the storage element: TS-GS3FS stores top secret data; HC-GS3FS contains high confidential or secret data and files; proprietary and confidential data are structured and organized into the PR-GS3FS; internal data, accessible to users previously authenticated into the virtual organization, are stored in clear into IUO-FS; and not encrypted public data are placed into unclassified/public FS (PD-FS) accessible without authentication. In such example we characterized top secret data as accessible by exclusively the owner in order to distinguish the implementation of the TS-GS3FS from HC-GS3FS storing high confidential or secret data and files. Obviously, in case more than one users share top secret data, the implementation of the TS-GS3FS is the same of the HC-GS3FS high confidential data. In order to implement open accesses to Grid virtual organizations, in which, instead, a user must be always authenticated for interacting with the Grid resources, a mechanism based on robot certificate and robot CA can be used. Thus a unidentified user that wants to access public data, has not to authenticate him/herself into the Grid but can ask to the robot CA a robot certificate. For example, public data can be published or accessed by websites, which automatically contact the robot CA for accessing the SE storing the required information. Following this specification and distinguishing on information classification, in a community of users sharing the data of a GS3 storage system, we can identify as many classes of users accessing the SE as FS implementing the GS3 storage. In the example reported in Fig. 17, User A can access the whole dataset by locally mounting all the FS: he/she is the only authorized user that can exclusively access top secret data, since the only copy of the TS-GS3FS DataKey stored into the storage element is encrypted by his/her public key.
28
User A and B, together with a restricted and well-known group of users, whose copies of the HC-GS3FS DataKey encrypted by their public key are stored into the SE, can also have access to high-confidential/secret data. Moreover, both A and B, together with all the users COMM , can also access the proprietary/confidential data having the community private key KPRIV of the PR-GS3FS. All the users accessing the Grid virtual organization, after authentication can also access the internal data stored into the IUO-FS. Furthermore, everybody can access public data contained in the PD-FS. Through the FUSE implementation of the GS3FS, a user wishing to access to parts of the dataset, has to simply mount the remote GS3FS that contains the data of interest into the local file system, if a copy of the DataKey encrypting such data encrypted by his/her private key is present into the SE. In case data are split into different FS, the user has to mount all of them, if authorized to access the corresponding DataKeys. Thus, in the referred example, user A mounts all the FS, B mounts HC-GS3FS, PR-GS3FS, IOU-FS and PD-FS, C mounts PR-GS3FS, IOU-FS and PD-FS, virtual organization authenticated users as D can mount both IOU-FS and PD-FS, and unidentified users can mount the PD-FS or, alternatively, can access public data from the Web. Table 2 shows some examples of organizations very sensitive to information security, categorizing data to be managed according to what described so far. 8 Conclusions This work describes GS3 , the grid secure storage system implemented and integrated into the gLite middleware. The proposed security algorithm is based on the idea of combining symmetric and asymmetric cryptography. The symmetric cryptography is directly applied to data, generating data always encrypted when stored on the Storage Elements. The symmetric key decrypting such encrypted data is in its turn encrypted by the user public key (asymmetric cryptography) and stored into the Grid remote storage system. Decryption is performed by the user interface node, and both the key and the data are allocated into unswappable memory pages of such node. In this way data can be accessed exclusively by the owner. In order to share such data with other users, it is necessary to store in the Grid storage copies of the DataKey encrypted by such user private keys. The GS3 implementation has been developed and integrated into the gLite middleware, specifying a secure file system (GS3FS) on top of the GFAL library. This implementation has been evaluated by mainly considering read, write and delete operations, providing satisfactory results also in case of real applications (GCC, SQLite). In this paper we also specify some guidelines for adapting and implementing the GS3 technique into real world applications. In particular we focus on those organizations that need to classify information according to security requirements. In such contexts we describe how it is possible to apply GS3 discriminating among different security levels. The technique and the approach we followed in the implementation of GS3 show several advantages: – Security: encrypting the symmetric DataKey by the authorized users public keys ensures that data and DataKey are accessible only by such authorized users. – Security Standard: both data, files and the file system structure are encrypted. – Library, API: a complete set of library functions is available, introducing new capabilities (files’ modification, rewriting, renaming, etc), and optimizing the existing one with the aim of security.
29
S ECURITY L EVEL O RGANIZA -
Top secret
High Conf.
Proprietary
Internal
Public
TS-GS3FS
HC-GS3FS
PR-GS3FS
IUO-FS
PD-FS
Passwords, Exclusive Messages, Projects, ... Employers
Reserved Info, Drafts, ...
IntraDepartmental Info, ...
Drafts,
Citizens
Intra/ InterOrganization Info, ... Internal
Personal Info, Passwords,
Reserved
Info,
Initiatives,
Laws, Public Directives, Press Releases, ... Announcements, Notifications,
Info,
...
...
Internal
Digital
Memos, Funds Info, ... Courts, Lawyers, Investigation Departments, Data, ... Procedures, Project Plans, Operational Routines, Designs and Specifications, ... InterDepartmental Patient’s Metadata, ...
Libraries, Memos, ... Tribunal/ Police Departments Data, ... Internal Memos, Projects/ Meetings Reports, ...
Research Results, ... Judgments, Arrest Warrants, Press Releases, ... Annual Reports, Press Statements, ...
Science Community Patient’s Data, ...
Public Patient’s Data, Statistics, Balance, ...
TION
Government
Public Admin.
University
Law
Corporation
Medical
Plans, ... ... StuUnpublished dent/Employers Personal Researches, Records, ... E-mail, ... ... ... Personal Evidences, Data, Judges/ Passwords, Investigation PIN, Offices Keys, ... Info, ... Investment Accounting Strategies, Information, Designs, Business Plans, Plans, ... Sensitive Info, ...
Personal Info, Passwords, PIN, Keys, ...
IntraDepartmental Patient’s Metadata, ...
Memorandums, ... Statistics,
Table 2 Example of GS3 Application Environments.
– File system interface: implementing GS3 as a file system (GS3FS) exploiting a FUSE interface allows users to access and manage remote-GS3 data and files as local files, also using the same system calls and applications. – Performance: the GS3 architecture is designed for satisfying specific performance requirements (file indexing, local file cache, etc). – Dependability and Fault Tolerance: since the DataKey is encrypted by the authorized users public key, it is possible to replicate it and also the data in order to implement dependable and fault tolerant storage systems, without loss in information security. We will further extend GS3 addressing the following aspects: – Security: no one can access the contents of GS3 files, but site administrators can anyway physically erase them.
30
– Overhead: to maintain a local file system structure has the disadvantage that each time a file is closed, renamed, deleted, etc, the structure of the storage system must be updated, introducing an extra remote write operation. – Consistency problems: enhanced technique are necessary in order to adequate manage consistency of information in data shering context. – Access control management: adequate policies of access control and rights management have to be developed in order to improve data sharing, also considering the management of X.509 certificates (CRLs, lifetime, etc.). – Job batch: last but not least, in order to achieve the highest security level it is necessary to collect and manage the data in trusted nodes. In other words, the GS3 algorithm ensures the maximum level of security only when the data are decrypted and managed by the local host (UI) from which the user interacts with the Grid data storage. The same level of security cannot be ensured in case of data are remotely processed by other, apriori unknown, nodes of the Grid, or equivalently, in case of batch jobs processing. Anyway, such disadvantages do not affect or compromise the validity of the approach that provides a significant contribution to the related state of the art as above highlighted. They constitute materials for future work on GS3 . Other interesting aspects we intend to further investigate are: security improvements, fault tolerance, Quality of Service, system optimization and use of multiple storage elements. Acknowledgements This work is partially supported by the EU-FP7 RESERVOIR project under grant #215605. The authors wish to thank the anonymous Reviewers, as well as the Editor-in-chief P´eter Kacsuk, for their helpful comments and suggestions.
References 1. Fuse: File system in user space. URL http://fuse.sourceforge.net/ 2. Abbas, A.: Grid Computing : Practical Guide To Technology & Applications,, 1st edition edn., chap. Chapter 8. G. Charles River Media (2003) 3. Andrews, P., Kovatch, P., Jordan, C.: Massive high-performance global file systems for grid computing. In: SC ’05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p. 53. IEEE Computer Society, Washington, DC, USA (2005). DOI http://dx.doi.org/10.1109/SC.2005.44 4. Bhardwaj, D., Sinha, M.K.: Gridfs: highly scalable i/o solution for clusters and computational grids. Int. J. Comput. Sci. Eng. 2(5/6), 287–291 (2006). DOI http://dx.doi.org/10.1504/IJCSE.2006.014771 5. Blanchet, C., Mollon, R., Deleage, G.: Building an encrypted file system on the egee grid: Application to protein sequence analysis. In: ARES ’06: Proceedings of the First International Conference on Availability, Reliability and Security, pp. 965–973. IEEE Computer Society, Washington, DC, USA (2006) 6. Brunie, L., Seitz, L., J-M.Pierson: Key management for encrypted data storage in distributed systems. In: IEEE Security in Storage Workshop, pp. 20–30. Washington DC, USA, October 2003, IEEE Computer Society (2003) 7. Chakrabarti, A.: Grid Computing Security. Springer-Verlag New York, Inc., Secaucus, NJ, USA (2007) 8. Chakrabarti, A., Damodaran, A., Sengupta, S.: Grid computing security: A taxonomy. Security & Privacy, IEEE 6(1), 44–51 (2008). DOI 10.1109/MSP.2008.12 9. Directorate for Science, Technology and Industry - Committee For Information, Computer And Communications Policy: The development of policies for the protection of critical information infrastructures (cii). Tech. rep., Organisation for Economic Co-operation and Development (2006). Http://www.oecd.org/dataoecd/ 25/10/40761118.pdf 10. EGRID Project: ELFI file system: EGEE Grid storage in a local filesystem interface 11. Foster, I.: What is the grid? - a three point checklist. GRIDtoday 1(6) (2002) 12. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The physiology of the grid: An open grid services architecture for distributed systems integration (2002). URL citeseer.ist.psu.edu/ foster02physiology.html
31 13. Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001) 14. Fruhwirth, C.: New methods in hard disk encryption. Tech. rep., Institute for Computer Languages, Theory and Logic Group, Vienna University of Technology (2005). URL http://clemens. endorphin.org/nmihde/nmihde-A4-ds.pdf 15. Ganguly, A., Agrawal, A., Boykin, P., Figueiredo, R.: Wow: Self-organizing wide area overlay networks of virtual workstations. Journal of Grid Computing 5(2), 151–172 (2007). DOI 10.1007/ s10723-007-9076-6 16. Garfinkel, S.: PGP: Pretty Good Privacy. O’Reilly Media (1994) 17. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003) 18. Global Grid Forum: Grid file system working group (gfs-wg) 19. Global Grid Forum Working Group: Grid File System Architecture Workbook v. 1.0 (2006) 20. GNU: GPG - GNU Privacy Guard - Documentation Sources - GnuPG.org. URL http://www. gnupg.org/documentation/ 21. GridSite Project: SlashGrid: transparent Grid access to HTTP(S) servers 22. Grimshaw, A.S., Wulf, W.A., The Legion Team, C.: The legion vision of a worldwide virtual computer. Commun. ACM 40(1), 39–45 (1997). DOI http://doi.acm.org/10.1145/242857.242867 23. Globus security infrastructure. Http://www.globus.org/Security/ 24. Honeyman, P., Adamson, W., McKee, S.: Gridnfs: global storage for global collaborations. International Symposium on Mass Storage Systems and Technology 0, 111–115 (2005). DOI http://doi. ieeecomputersociety.org/10.1109/LGDI.2005.1612477 25. Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., West, M.J.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988). DOI 10.1145/35037.35059 26. Hwang, K., kwong Kwok, Y., Song, S., Cai, M., Chen, Y., Chen, Y., Zhou, R., Lou, X.: Gridsec: Trusted grid computing with security binding and self-defense against network worms and ddos attacks. In: International Workshop on Grid Computing Security and Resource Management (GSRM’05), in conjunction with ICCS 2005, pp. 187–195 (2005) 27. Hydra project. Https://twiki.cern.ch/twiki/bin/view/EGEE/DMEncryptedStorage 28. IBM Corporation: Smart Storage Management with IBM General Parallel File System (GPFST M ). White paper (2009). ftp://ftp.software.ibm.com/common/ssi/pm/fy/n/clf03001usen/CLF03001USEN.PDF 29. Institute of Electrical and Electronics Engineers, Los Alamitos, CA, USA: The Authoritative Dictionary of IEEE Standards Terms, 7t h edn. (2000) 30. International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), Geneve, Switzerland: ISO/IEC 27002:2005 Standard: Information technology – Security techniques – Code of practice for information security management (2005) 31. ISO/IEC: ISO/IEC 27002:2005 - “Information technology - Security techniques - Code of practice for information security management” (2005). Http://www.iso27001security.com/html/27002.html 32. Junrang, L., Zhaohui, W., Jianhua, Y., Mingwang, X.: A secure model for network-attached storage on the grid. In: SCC ’04: Proceedings of the 2004 IEEE International Conference on Services Computing, pp. 604–608. IEEE Computer Society, Washington, DC, USA (2004) 33. Kesselman, C., Foster, I.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers (1998) 34. Maad, S., Coghlan, B., Quigley, G., Ryan, J., Kenny, E., O’Callaghan, D.: Towards a complete grid filesystem functionality. Future Gener. Comput. Syst. 23(1), 123–131 (2007). DOI http://dx.doi.org/10. 1016/j.future.2006.06.006 35. Menezes, A.J., Vanstone, S.A., Oorschot, P.C.V.: Handbook of Applied Cryptography. CRC Press, Inc., Boca Raton, FL, USA (1996) 36. Microsystems, S.: Zfs learning center. URL http://www.sun.com/software/solaris/zfs_ learning_center.jsp 37. gLite Middleware Technical Committee: GFAL C API Description. CERN, Geneve 38. Montagnat, J., A. Frohner, D.J., Pera, C., Kunszt, P., Koblitz, B., Santos, N., Loomis, C., Texier, R., Lingrand, D., Guio, P., Rocha, R.B.D., de Almeida, A.S., Farkas, Z.: A Secure Grid Medical Data Manager Interfaced to the gLite Middleware. Journal of Grid Computing 6(1), 45–59 (2008) 39. Nadalin, A., Kaler, C., Monzino, R., Hallam-Baker, P.: Web Services Security: SOAP Message 1.1 (WSSecurity 2004). OASIS Standard Specification. Web Service Security (WSS)-OASIS, 1.1 edn. (2006). Http://docs.oasis-open.org/wss/v1.1/ 40. Pacitti, E., Valduriez, P., Mattoso, M.: Grid data management: Open problems and new issues. Journal of Grid Computing 5(3), 273–281 (2007) 41. eCryptfs Project WebSite: ecryptfs - enterprise cryptographic filesystem (2010)
32 42. Rivest, R.L., Shamir, A., Adelman, L.M.: A method for obtaning digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978) 43. Rosenberg, J., Remy, D.: Securing Web Services with WS-Security: Demystifying WS-Security, WSPolicy, SAML, XML Signature, and XML Encryption. Sams (2004) 44. Sandberg, R., Goldberg, D., Kleiman, S., Walsh, D., Lyon, B.: Design and implementation of the sun Network Filesystem. In: Proc. of Summer 1985 USENIX Conf., pp. 119–130. Portland OR (USA) (1985). URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14. 473 45. Scardaci, D., Scuderi, G.: A secure storage service for the glite middleware. In: International Symposium on Information Assurance and Security, pp. 261–266. IEEE Computer Society, Los Alamitos, CA, USA (2007) 46. Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979) 47. Stallings, W.: Cryptography and Network Security: Principles and Practice, 3rd edn. Pearson Education (2002) 48. Standards, F.I.P.: FIPS Publication 197: Advanced Encryption Standard (AES). National Institute of Standards and Technology (NIST), USA (2001). Http://csrc.nist.gov/publications/fips/fips197/fips197.pdf FILE SYSTEM – High-Performance Stor49. Sun Microsystems, Inc.: LUSTRET M age Architecture and Scalable Cluster File System. White paper (2008). URL: https://www.sun.com/offers/docs/LustreFileSystem.pdf 50. Tatebe, O., Soda, N., Morita, Y., Matsuoka, S., Sekiguchi, S.: Gfarm v2: a grid file system that supports high-performance distributed and parallel data computing. In: Computing in High Energy Physics (CHEP) (2004) 51. Thain, D., Livny, M.: Parrot: An application environment for data-intensive computing. Journal of Parallel and Distributed Computing Practices (2004) 52. Thain, D., Moretti, C., Hemmes, J.: Chirp: a practical global filesystem for cluster and grid computing. Journal of Grid Computing 7(1), 51–72 (2009). DOI 10.1007/s10723-008-9100-5. URL http://dx. doi.org/10.1007/s10723-008-9100-5 53. Tilborg, H.C.v.: Encyclopedia of Cryptography and Security. Springer-Verlag New York, Inc., Secaucus, NJ, USA (2005) 54. Tipton, H.: Information Security Management Handbook, 5th edn. CRC Press, Inc., Boca Raton, FL, USA (2003) 55. Tu, M., Li, P., Yen, I.L., Thuraisingham, B., Khan, L.: Secure Data Objects Replication in Data Grid. Transaction on Dependable and Secure Computing (2010). To appear 56. US National Archive and Records Administration: Executive Order 13292Further Amendment to Executive Order 12958, as Amended, Classified National Security Information, vol. 68. US Federal Register (2003). Http://www.archives.gov/isoo/policy-documents/eo-12958-amendment.html 57. WebSite, T.P.: Truecrypt free open-source disk encryption software (2010) 58. Weil, S., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI ’06), vol. 7. USENIX (2006). URL http://www.ssrc.ucsc.edu/proj/ ceph.html 59. Zhao, M., Figueiredo, R.J.: A user-level secure grid file system. In: SC ’07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pp. 1–11. ACM, New York, NY, USA (2007). DOI http: //doi.acm.org/10.1145/1362622.1362683