A Distributed Storage System with dCache G Behrmann1 , P Fuhrmann2 , M Grønager1 and J Kleist3 1
NDGF - Nordic DataGrid Facilty, Kastruplundgade 22(1), DK-2770 Kastrup DESY - Notkestrae 85, D-22607 Hamburg 3 NDGF and Aalborg University, Department of Computer Science, Selma Lagerl¨ ofsvej 300, DK 9220 Aalborg SØ 2
E-mail:
[email protected],
[email protected],
[email protected],
[email protected] Abstract. The LCG collaboration is encompassed by a number of Tier 1 centers. The Nordic LCG Tier 1, operated by NDGF, is in contrast to many other Tier 1 centers distributed over the Nordic countries. A distributed setup was chosen for both political and technical reasons, but also provides a number of unique challenges. dCache is well known and respected as a powerful distributed storage resource manager, and was chosen for implementing the storage aspects of the Nordic Tier 1. In contrast to classic dCache deployments, we deploy dCache over a WAN with limited bandwidth, high latency, frequent network failures, and spanning many administrative domains. These properties provide unique challenges, covering topics such as security, administration, maintenance, upgradability, reliability, and performance. Our initial focus has been on implementing the GFD.47 OGF recommendation (which introduced the GridFTP 2 protocol) in dCache and the Globus Toolkit. Compared to GridFTP 1, GridFTP 2 allows for more intelligent data flow between clients and storage pools, thus enabling more efficient use of our limited bandwidth.
1. Introduction The Nordic Data Grid Facility (NDGF) is a virtual Core Infrastructure Center (CIC) that utilizes resources from more than 7 compute centers in 4 countries. The motivation for such a virtual center comes from the fact that even though the Nordic countries are considered one of the riches and most developed areas in the world, each country is quite small and internally divides compute resources into multiple geographical locations within multiple organizational domains. The geographical area spanned by the NDGF infrastructure is about 1000km by 1000km. A Core Infrastructure Center provides first and foremost a compute service and a storage service and as the NDGF CIC is also a WLCG Tier 1 the storage service should be accessible through a single SRM [1] interface. This means that instead of using a normal batch system for administering compute resources, the NDGF CIC uses the ARC [2] grid middleware to provide access to the resources. Furthermore, the storage system has to work seamlessly across multiple domains and wide area network (WAN) connections, and be able to integrate tape as well as disk systems into one unified storage area. Possible candidates for such a system were CASTOR [3] and dCache [4], other systems providing and supporting the SRM interface are DPM[5] and STORM [6].
During the pilot phase of NDGF, several tests were conducted using DPM as well as dCache – these tests indicated that both DPM and dCache were usable over multiple domains and over WAN. However, as dCache also supported tape storage and a good dialog with the development team was established, dCache became the obvious choice. CASTOR was found too hard to deploy at multiple domains, especially as the Nordic centers use several different Linux distributions and as support for other distributions than Scientific Linux Cern 3 [7] was very limited. STORM, which is essentially an SRM layer for the GPFS [8] file system, was not found suitable to run over multiple domains either as this would imply a multi domain GPFS file system. The long term vision for data storage for the NDGF CIC is similar to the grid vision for the compute service: No single point of failures and a high level of redundancy. dCache does not fully adhere to these goals, but has many of the required properties and has potential to evolve into the desires system. This paper is structured as follows. Section 2 gives a brief overview of the NDGF dCache installation. Section 3 describes challenges we met with such a setup, describes how some of them were overcome as well as open problems. Sections 4 and 5 go into details with some of these problems, and how they were solved, including implementing extensions to the GridFTP protocol described in GFD.47. Finally section 6 concludes the paper. 2. NDGF dCache installation dCache uses a service oriented architecture (SOA), based on an in-house developed message passing layer termed cell. At one extreme, all services or components (called cells in dCache terminology) may run on a single host. A Tier 2 may decide to use such a setup. At the other extreme, each cell may run on a host by itself, communicating with the other cells via TCP. A very large Tier 1 may choose such a setup. Most deployments are somewhere in between the two extremes. Slightly oversimplified, a normal dCache installation consists of any number of storage pools, a pool manager, a namespace manager, and a number of protocol doors.1 Pools may optionally be connected to a hierarchical storage manager (HSM). The cells in a traditional dCache installation are connected via a local area network (LAN), with the protocol doors providing WAN access to the storage system, e.g., using protocols such as SRM, GridFTP, HTTP, and XROOT. The NDGF installation differs by placing storage pools at various organizations, all connected to the same dCache system. Given that dCache cells communicate over TCP, the differences – from a configuration point of view – between our setup and a traditional setup are minor. Our primary goal in the design was to ensure that NDGF will be able to take data 24x7 from the LHC accelerator. Therefore, we kept core cells like the pool manager and the namespace manager – which are single points of failures – and all protocol doors at a single site in Copenhagen, Denmark, see Figure 1. This particular site was chosen since this is our end-point of the LCG OPN. Thus in case of a major disaster, like catastrophic power loss,2 we would not be able to receive data anyway. There is no storage at this site, except for the namespace representation. Instead, storage is provided to us by 8 sites, of which 4 are currently deployed. These sites only run dCache pool cells, which are relatively simple to install, configure and upgrade. Communication between the pool cells and our core dCache components is routed over the public IP network provided by NORDUnet and the NRENs in the Nordic, using firewalling to limit access to cell communication. 1
The picture is slightly more complicated, as there are a number of utility components for administration, monitoring, etc. 2 Note that the site is well equipped with UPS and diesel generators.
UiB
CSC
UiO
Public IP
HSM
dcache
LCG OPN
DCSC
ftp1
srm
pnfs1
pnfs2
Figure 1. The NDGF dCache installation. Central components like the pool manager (dcache), namespace manager (pnfs1 and pnfs2), protocol doors for GridFTP, SRM and XROOT (ftp1 and srm) are installed at a single site in Copenhagen, Denmark. This site is connected directly to the LHC OPN, via GEANT. Pools at several sites in the Nordic countries are connected to the dCache installation via the public IP network, some of them with an HSM attached, others without. Although we provide installation guides for our storage owners, much of the pool design is left open. We recommend using pool sizes of 5 to 10 TB (which we feel is a good compromise between granularity and administration overhead), using XFS or GPFS for the file system. Some sites have expensive fiber channel based SAN solutions, whereas other sites have direct attached SAS based RAID boxes. Some sites run with a very high storage density per host, currently up to 50TB per host (although they plan to go as far as 100TB per host), whereas others prefer having a single 7TB pool per host. We recommend running dCache pools in a non-root user account, using either Java 5 or Java 6. We leave the choice of operating system to the site, although all sites currently use some form of Linux; they could use SUN Solaris or even MS Windows, if they wanted to. Only two of our sites currently have an HSM, and both happen to use Tivoli Storage Manager. However as long as we can make the HSM work with dCache, the choice of HSM is again up to the site. dCache provides a high degree of flexibility with respect to data placement. It is quite possible to direct data in certain directories, or data written with a particular protocol, or from a particular IP sequence, etc. to a particular set of pools. At NDGF, we have decided against such fine grained control over data placement. Pools are assigned to specific VOs, and split into disk-only and tape-backed pools, but other than that, we do not attempt to group files – e.g. files in the same data set – on particular pools. The primary reason is that we have to be able to tolerate downtime on pools. Although NDGF has people on call 24x7, our sites do not. By not specializing pools further, we increase the likelihood that some pools are always available
for writing. Obviously, disk-only files on pools that are down are unavailable. The primary goal of the design was, however, to be able to take data from CERN. Since our compute elements (which all use the ARC middleware) pre-stage input files, we do not expect brief periods of downtime on pools to be a major problem; the situation is not unlike having to wait for a file to be staged from a busy HSM. 3. Challenges The NDGF dCache installation may at first glance appear to be quite similar to other dCache installations. Compared to some of the large WLCG Tier 1 installations, the NDGF installation is even quite modest. The crucial difference however lies in the fact that dCache at NDGF uses a WAN for internal dCache communication. Contrast this to a traditional installation, in which dCache internal communication is restricted to a local area network (LAN), often a single subnet, and in which the WAN is only used for external access from the grid. Although TCP/IP does a good job at hiding the differences between a LAN and a WAN at the API level, it cannot hide the fact that a WAN in contrast to a LAN is characterized by: • • • •
Limited bandwidth High latency Frequent network failures Multiple administrative domains
Each of these properties introduces challenges for a distributes storage solution. This section will briefly describe at least some of these challenges. We do not claim completeness, nor do we describe solutions to all problems. The purpose is rather to serve as inspiration for things to address and as an indication towards what NDGF will contribute to dCache. 3.1. Security Since NDGF dCache pools are hosted by sites at different organizations located in four countries, and since communication between pools and central dCache components are routed over public networks, security is more challenging in a distributed dCache installation than in a local dCache installation. Local rules at the sites may make the installation of a dCache pool difficult. Possible issues may include regulations about open ports in the site firewall or installing untrusted services as root. Wrt. to the first issue, dCache pools establish connections for three purposes: communication with the central dCache components, communication with clients, and pool to pool transfers. The first will always be an outgoing connection and is thus unlikely to be a problem. The last two are established on configurable port ranges, which certainly helps in convincing the local security experts to open their firewall. Wrt. the second issue, dCache 1.7 can operate as a non-root user, although the installation procedure is more complicated in this case. The fact that dCache is a pure Java application also reduces the possible attack vectors. In some cases national rules may also serve problematic. Finland for instance has quite strict rules about data export. In a distributed dCache setup, data transfers will be logged centrally by the billing cell, which is located outside of Finland. Although we do not believe this to be illegal, such issues have to be considered. dCache is internally implemented as a distributed application, using an in-house developed message passing system called the cell-layer. The cell-layer predates many other message passing systems for Java, and has served dCache well. However, since a traditional installation of dCache is limited to a single site, the communication is in dCache 1.7 unencrypted and unauthenticated. This is for obvious reasons not acceptable for a dCache WAN deployment. To address this problem, dCache 1.8 introduces the possibility to use SSH1 for internal dCache communication.
Since SSH1 is not considered a secure protocol, NDGF is considering contributing a patch to enable SSL for internal dCache communication. Until this has been implemented, NDGF relies on firewalls to protect the relevant ports. Traditionally, dCache has relied heavily on a system called PNFS. PNFS is a server implementation of the NFS2 protocol, and the exported file system was prior to dCache 1.7 required to be mounted on all dCache hosts. In dCache 1.7, this has been reduced to some of the central components, some protocol doors and to pools writing files to attached HSM systems. Due to the design of NFS2, mounting an NFS file system remotely across security domains is unacceptable. For this reason, NDGF initially operated without HSM backends. The problem is known to the dCache developers, and they have systematically reduced the use of PNFS in dCache. In dCache 1.8 it is possible to completely replace PNFS with a new namespace provider called Chimera. This will address the security concerns with NFS2. Even without Chimera, dCache 1.8 pools no longer need to mount the PNFS file system. 3.2. Administration NDGF storage resources are not owned by NDGF. Most of the storage systems are not even dedicated for NDGF purposes, and must be shared with site local use. NDGF therefore has no direct administrative access to the systems hosting our dCache pools; in most cases not even a user account. This provides a number of challenges for us and for our sites. Our sites are worried that they may loose too much control over their storage system. Most dCache administration is performed via the dCache administrative interface. The access control system is however not flexible enough to limit access to parameters relevant to the site in question. Short of given site administrators access to all of dCache, we are forced to shut them out of dCache administration tasks. dCache 1.8 provides improved access control mechanisms, but NDGF has not yet investigated whether this solves our problems. Due to resources being shared, many sites wish to also use dCache for local purposes, i.e. storage not pledged for the NDGF Tier 1, but part of the same storage system. It would be convenient if this storage could be exposed as additional dCache pools using the same distributed dCache installation. However, due to aforementioned problems with providing administrative access, and due to increased reliability problems (in both directions; sites would be exposed to failures of NDGF’s central dCache components, and NDGF would be exposed to increased load from non-Tier 1 transfers), we recommend sites to simply maintain a parallel dCache installation for local purposes. It is quite possible to run pools from several unrelated dCache installations on the same host, although the installation procedure once again becomes more complicated. NDGF has no control over which type of hardware is bought by our sites, nor which operating systems are used. To our luck, dCache is written in Java, and dCache pools can be installed on any platform with a recent version of Java. Upgrading a large dCache installation has always been challenging. Between non-bugfix releases, the upgrade of all dCache components has to be synchronized, e.g., it is not possible to run a central dCache 1.8 installation with dCache 1.7 pools. In the NDGF dCache installation, the upgrade procedure is even more challenging, as pools must be upgraded by the site administrators. A synchronized upgrade requires us to find a common time window in which all sites can provide the necessary human resources to upgrade their installation. Bugfix releases can often be installed without bringing the whole installation down, which at least allows us to resolve bugs without too much hassle. 3.3. Reliability dCache is touted for its resilience against pool failures. Except for the unavailability of files that happen to only be stored on failed pools, pool failure has no ill effect on a dCache installation.
The central dCache components, however, provide single points of failures. This was a deliberate choice by the dCache developers, as such a design is much simpler to implement and maintain than a replicated setup. In particular stateful components like the namespace representation are tricky to replicate. One may attribute much of the success of dCache to its relatively simple design, and NDGF concurs that this was a wise choice. Compared to a traditional installation, NDGF’s dCache installation is neither more nor less susceptible to failure of central dCache components and for the purposes of WLCG, NDGF is happy with the current design. We do however realize that to achieve our long term goal for a fully redundant distributed data grid without single points of failures, this problem needs to be addressed. A WAN installation of dCache introduces failure modes that are unlikely on a LAN. For instance, network separation is unlikely on a LAN, but almost a daily occurrence on a WAN. Network separation will temporarily interrupt the internal dCache communication infrastructure. As is to be expected in such cases, new bugs and problems were revealed. There is no point in providing a complete list here, but to illustrate our point, we would like to share one such problem: When a file repeatedly fails to be accessible, dCache interprets this as a serious problem with either the tape system or the pool hosting the file. As a result dCache suspends all access to the file until an operator has investigated the problem. In the presence of network separation, 3 retries, and a retry period of 15 minutes, a file would be suspended if the separation lasted more than 45 minutes. Although technically not a bug, the behavior was inappropriate when deployed on a WAN, and NDGF has contributed a fix for dCache 1.8. 3.4. Performance Throughput in dCache is excellent and scales with the number of pools as long as network capacity is available and pools can be kept busy with transfers. The dCache developers are currently addressing scalability limits with the frequency of file operations, but such limitations are in no way related to WAN deployment of dCache. There are however two other performance issues, which, although not unique to NDGF, become more serious in our setup. The first is related to the FTP protocol and will be explained in detail in Section 4. The second is the absence of a network model in dCache. To dCache, all pools and all protocol doors are equal wrt. the network. When deploying pools and maybe even protocol doors in several countries, this is no longer true. Some pools may be a lot closer to the client than others; the same is true when replicating files as a result of hot-spot detection. dCache does allow to steer the pool selection according to the client IP address, but doing so for our purposes is not particularly scalable from an administrative point of view, and is not even desirable, as distance and bandwidth should not be used to make definitive decisions over data placement, but should influence the pool selection on the same level as the load and free space of a pool does today.3 4. Implementing GFD.47 The GridFTP protocol was introduced in 2003 as an Open Grid Forum (OGF) recommendation in GFD.20[9]. Besides requiring the use of an encrypted control channel, it extends the FTP protocol with features such as striping (more than two hosts participate in the data transfer, breaking the limits of a single NIC), parallel streams (multiple TCP connections between the same two hosts, thus working around throughput limitations in some TCP/IP stacks), partial file transfers, TCP buffer size negotiation, and instrumentation. In particular the use of parallel streams and GSI authentication has made the protocol popular, making it the most widely used data transfer protocol in WLCG. 3
The same observation can be made for the selection of a protocol door.
pasv ftp1.ndgf.org:20000 STOR /atlas/AOD.23213 Client
FTP ftp1.ndgf.org
Receive file Message Data flow pool1.ndgf.org
Figure 2. In classic FTP, setting up a passive data transfer requires at least to commands on the control channel. The first to setup the passive transfer, and the second to initiate the transfer. A distributed storage system like dCache faces the dilemma that it has to respond with the address of the data endpoint in response to the first command. Not knowing which file will be transferred and thus which pool to use, it will have to use the door as the data end point and then initiate a proxy for the data connections. A few months after GFD.20, GFD.21[10] pointed out limitations and problems in GFD.20. Some problems were introduced by GFD.20, and others were inherited from the FTP protocol. To address these problems, GFD.47 introduced the GridFTP v2 protocol in 2005. Since then, the protocol has remained in draft status, without any implementations. Until now, that is. When NDGF decided to deploy dCache as a single installation spanning four countries, we immediately realized that we would have a problem with FTP transfers. In dCache, FTP is implemented via gateways called doors. The clients establish an FTP control channel to the FTP door, requesting the transfer of a file. As is usual with FTP, the file itself is transferred over a separate data channel. Although it is possible to establish the data channel between other hosts than those handling the control channel,4 so called passive transfers, in which the client establishes the data channel to the server, will in dCache always flow through the FTP door, see Figure 2. This is due to limitations in the FTP control channel protocol. The consequence is that each passive transfer effectively hits the network twice. This may be a mere annoyance in a traditional data center, but is quite unfortunate in the NDGF dCache setup: A transfer from one of our compute elements to a dCache pool – that might happen to be located right next to it – could end up going to the door in a different country — and back. To make matters worse, deciding to use active transfer (in which the server rather than the client establishes the data connection) is not an option: First of all, clients may be behind firewalls, which due to the use of encryption on the control channel cannot open a port for the reverse connection. Second, GridFTP introduced the extended block mode data channel protocol in order to support parallel streams and striping. To avoid a race condition in the end of file negotiation, the protocol was limited such that the sender would always be the one establishing the data channel, i.e., a receiving server would have to be passive. This particular problem happens to be solved by GFD.47, which explains our interest in the new protocol. The solution here is two fold: 4
Some may consider this a hack, but recall that FTP was designed to handle third party transfers, in which the data channel is not established to the client who created the control channel. Also, GFD.20 introduced striping, in which multiple hosts may establish data channels.
PUT file=/atlas/AOD.23213;pasv; pool1.ndgf.org:20000 Transfer starting Client
FTP ftp1.ndgf.org
Receive file Message Data flow pool1.ndgf.org
Figure 3. GFD.47 introduces the GETPUT command set, which allows all relevant transfer parameters to be specified in a single command. This allows dCache to select a proper pool before having to respond with the address of the data endpoint, and thus a direct data connection between the client and the pool can be established. First, GFD.47 introduces new commands for requesting data transfers: GET and PUT. In contrast to FTP RETR and STOR commands, GET and PUT take a parameter list containing all relevant information about the transfer. In particular, this makes it possible to request a passive upload with a single command rather than two, as it would be the case with normal FTP. This in turn allows the server to decide on a destination pool for this particular file and send the destination address back to the client, which can then establish a data channel directly to the pool, see Figure 3. This is known as the GETPUT feature of GFD.47. Second, GFD.47 introduces the new extended block mode data channel protocol, known as MODEX. This protocol resolves the race condition of the GFD.20 extended block mode data channel protocol, while at the same time introducing strong checksum verification on data blocks and the ability to multiplex several files on the same data channel. MODEX removes the coupling between transfer direction and data channel establishment. GFD.47 provides other features, including dynamic network resource allocation and exchange of checksum information. In the spring of 2007, NDGF contributed implementations of GETPUT and MODEX to dCache 1.8. Since a server implementation of GFD.47 is of no use without clients, NDGF contributed client implementations of the GETPUT feature to the Globus Toolkit 4, which we have been promised will be included in version 4.1. These patches are as of version 0.6.1 shipped with the ARC middleware, which is the primary grid middleware used by NDGF. We have since decided to backport these additions to Globus 2.4.3, since it is used in the current version of gLite, and thus in FTS, the File Transfer Service. With ARC and FTS covered, the most important clients used at NDGF have been GFD.47 enabled. To cover remaining clients, we have contributed an implementation of GETPUT to the COG JGlobus Java library. We should add that Fermilab implemented the checksum extensions of GFD.47 in dCache 1.8 and in the client libraries of the Globus Toolkit 4.1. Table 1 gives an overview of GFD.47 implementations. It is important to point out that the implementation of GFD.47 in dCache 1.8 is the first implementation of the GridFTP v2 protocol. Although the GridFTP daemon shipped with the Globus Toolkit has reached release version 2, it only implements the GridFTP v1 protocol as defined in GFD.20. This has historically lead to quite some confusion about what exactly is referred to by the term GridFTP v2. As a result of these development efforts, GFD.47 has been put on the agenda for the OGF21
Tool Globus Toolkit 2.4.3 Globus Toolkit 4.0 Globus Toolkit 4.1 COG JGlobus 1.4 dCache 1.8 ARC 0.6.1 FTS 2 and gLite
Component Client Client Client Client Client and server Client Client
Native As patch As patch Yes As patch Yes Yes Via GT patch
Features GETPUT GETPUT GETPUT, CHKSUM GETPUT GETPUT, CHKSUM, MODEX GETPUT GETPUT
Table 1. Overview of GFD.47 implementations. meeting, and we hope that the renewed interest in the protocol will eventually lead to the the acceptance of the protocol as an OGF recommendation. 5. Supporting multiple HSM instances In contrast to other WLCG Tier 1 centers, NDGF has access to many HSM systems for use with dCache. This has provided a challenge for the deployment of dCache, since it is assumed in dCache 1.7, that all pools with HSM access have equal access to all HSM systems. This may be true in a traditional data center, but is most certainly not true for NDGF. This problem was first realized by the dCache developers, when we approached them about deployment of dCache at NDGF. Together, we developed the following simple solution, which is now part of dCache 1.8. HSM integration in dCache is performed via backend scripts. These scripts are called by the dCache pool code whenever dCache wants to flush a file to tape or wants to stage it back to disk. Pools autonomously decide to flush files to tape. In contrast, the decision to restore a file from tape is performed centrally by the pool manager when the file is accessed. In dCache 1.7, the pool manager has no knowledge about whether the pool is indeed attached to the particular HSM, only whether a pool is allowed to be used for reading from tape in general. In dCache 1.8, each HSM is assigned a unique name. Whenever a file is flushed to tape, the pool notifies the namespace component of dCache about the flush and includes the HSM instance name in the message. This information is recorded as meta data for the file, i.e., in PNFS or Chimera, see Figure 4. Pools periodically broadcast information about their status to other dCache components, in particular to the pool manager. In dCache 1.8, information about attached HSM systems has been added to this status information, which now allows the pool manager to select a pool for stage operations, which indeed is attached to the named HSM system. This is not in anyway limited to the pool which originally wrote the file to tape. Any pool with a configured HSM using this particular HSM instance name is eligible for election, subject to cost constraints on the current load and fill rate of the pool. It should be pointed out that this addition was jointly developed by NDGF and DESY. 6. Summary The NDGF storage services have been operational since January 2007 and are fully integrated into the WLCG infrastructure. Even though a WAN deployment of dCache is challenging, the setup has worked far better than we had dared hope. NDGF still needs to ramp up with additional storage resources, and more sites will be added over time. We are confident that dCache is up to the task. NDGF will continue to contribute to dCache. Section 3 outlined the challenges we are facing, and NDGF will systematically work to address them. This is not a small task, and will keep us busy for years to come. In particular addressing single points of failures in dCache will be an
(2) PNFS
(1) (s) NBI1
NBI HSM System
(4)
(5)
(7)
(3)
(s)
FTP Door
Pool Manager (6)
NBI2
Figure 4. Messages involved when flushing a file to tape and reading it back in: (s) periodic status update messages from the pools NBI1 and NBI2 to the pool manager, including information about the attached HSM system called NBI; (1) the file is flushed to tape; (2) the PNFS manager is informed about the flush and the HSM instance name is recorded in the meta data of the file; (3) an FTP door requests the file; (4) the pool manager requests the storage information from the PNFS manager; (5) the PNFS replies with the storage information; (6) assuming the pool NBI1 has erased its cache copy of the file, the pool manager may tell the pool NBI2 to stage the file from the attached HSM system; (7) the file is read back to disk. interesting and challenging task – it is however an important step for dCache to make for it to be a viable long term solution for NDGF storage requirements. References [1] Sim A, Shoshani A, Perelmutov T, Petravick D, Corso E, Magnoni L, Gu J, Badino P, Barring O, andFlavia Donno J B, Litmaath M, Witt S D, HaddoxSchatz J J M, Hess B, Kowalski A and Watson C 2007 The storage resource manager interface specification - version 2.2 http://sdm.lbl.gov/srm-wg/doc/SRM.v2. 2.html [2] Smirnova O, Eerola P, Ekelf T, Ellert M, Hansen J, Konstantinov A, Knya B, Nielsen J, Ould-Saada F and Wnnen A 2003 International Conference on Computational Science ICCS 2003 (LNCS vol 2657) (Melbourne, Australia and St. Petersburg, Russia: Springer Berlin / Heidelberg) p 264 [3] Castor - cern advanced storage manager http://castor.web.cern.ch/castor/ [4] Fuhrmann P dCache, the overview http://www.dcache.org/manuals/dcache-whitepaper-light.pdf [5] DPM http://goc.grid.sinica.edu.tw/gocwiki/How to install the Disk Pool Manager %28DPM%29 [6] Corso E, Cozzini S, Donno F, AGhiselli, LMagnoni, MMazzucato, Murri R, PPRicci, Terpin A, VVagnoni, RZappi and HStockinger 2006 CHEP06 [7] SLC3 http://linux.web.cern.ch/linux/scientific3/ [8] Fadden S 2007 The next step forward in storage virtualization: IBM General Parallel File System (GPFS) and IBM TotalStorage SAN File System (SFS) Come Together IBM Whitepaper [9] Allcock W 2003 GFD-R-P.020: GridFTP: Protocol Extensions to FTP for the Grid http://www.ogf.org/ documents/GWD-R/GFD-R.020.pdf [10] Mandrichenko I 2003 GFD.21: GridFTP Protocol Improvements http://www.ggf.org/documents/GFD.21. pdf