System for Grid and Cloud computing), as a distributed file system whose aim .... Where are the replicas of the file? ; The owner node manages the catalogue of.
DFSgc: Distributed File System for Multipurpose Grid Applications and Cloud Computing Carlos de Alfonso1, Miguel Caballer1, José V. Carrión1, Vicente Hernández1 1
Instituto ITACA - Universidad Politécnica de Valencia email: [calfonso,micafer,jocarbur,vhernand]@itaca.upv.es
Abstract Grid Computing currently needs the support for managing huge quantities of storage. Most of Grid deployments only provide local storage support and lack essential features as decentralization of the file catalog. This paper explains the design of a distributed file system whose aim is to provide a virtual storage for multipurpose applications. A complete distributed catalogue system avoids faults so the entire catalog of files is available all the time. The idea is to allow users and applications to manage files like a local storage system hiding the physical addressing. An intelligent replication system uses heuristic techniques to bring data closer to the Grid applications so the applications avoid investing time in transferences.
1 Introduction Grid Computing has evolved and currently needs the support for managing huge quantities of storage. Most of Grid deployments only provide local storage support and the applications may guess where the data files are effectively stored and transfer them where they are needed. There are some approaches of virtualization of the data storage such as EGEE’s Replica Service [1], LCG File Catalog [2], Replica Location System from Globus and DataGrid [3], Hadoop Distributed File System [4] or Microsoft’s Distributed File System [5]. None of them has been consolidated by the community, because they omit essential features such as Decentralization of the file catalog, in order to avoid a central server who manages the entire catalogue of files. Seamless bringing the data closer to where it is needed or automating an intelligent replication. Creating a uniform namespace, as current systems assume that applications will know how files are called and organized. Most of these file systems are created with specific purposes, as many of them have been created as a part of a project. So they focus in the specific needs of the data which is managed in the main project. In the last years Grid computing has achieved some kind of stability, and its usage has been generalized in scientific environments for multipurpose applications. Also the advent of Cloud computing has exposed the need of ubiquitous storage systems which may able the applications to seamless distribute data files without the need of creating specific networks, storage systems or use artifices to take profit of existing systems. There is a need of having a disposal of infrastructure services which provide
access to a common storage system. This common distributed file system, in terms of computing, is similar to a scheduler which provides some kind of seamless access to the resources which are related to it. This paper introduces the design of DFSgc (which stands for Distributed File System for Grid and Cloud computing), as a distributed file system whose aim is to provide a virtual storage for multipurpose applications, adapted to the characteristics of distributed technologies such as Grid computing and Cloud Computing. The paper is organized as follows: section 2 introduces an analysis of the state of the art in Grid and multipurpose file systems. Section 3 makes a description of the DFSgc file system. Section 4 shows a use case for a write operation to demonstrate the functionality of the system. Finally in the last section some conclusions and further works are commented.
2 Previous Works Some of the requirements which are introduced by Grid and Cloud computing to a distributed file system are related to the distributed nature of the computation and the access to the data files. But there are some others requirements which are brought by the applications, in order to ease the portability into these distributed technologies. Apart from the obvious requirement of storing the data files, there are other requirements such as those described below: Virtualization of the file namespace, to manage filenames which are not associated to any physical storage location. It will also enable to organize the files according to directories and subdirectories. Distribution of the files among the storage nodes to take profit of the storage capacity. Replication of files, in order to provide redundancy and to obtain an aggregated bandwidth when reading from different clients. Exploiting locality of access to the files and its replicas, bringing the physical data files to where they are needed, to lower the network transferences. Coherence of replicas as the file system must provide the same data for a single file, despite the chosen replica. Decentralization of the file catalog, in order to avoid a central server who manages the entire catalogue of files. Failure tolerance, as the file systems may continue working when parts of it fail or disappear. Ease of use, by providing simple interfaces which would be able to be used in multiple environments. Table 1 summarizes how common Grid and multipurpose file systems solve these requirements.
File System
LCG Catalog Repl. Globus Hadoop Microsoft’s DFS P2P storage
Distributed Catalog
Replication Support
Fault Tolerance
Grid Oriented
Replica Coherency
Determinist Discovery
Heuristic Storage
Partial Partial NO NO
YES YES YES YES
Middle Middle Low Low
YES YES YES NO
YES NO YES YES
YES NO YES YES
NO NO NO NO
YES
YES
High
NO
YES
YES
NO
Table 1: Some features of the main distributed file systems.
In this table, columns refer to (from left to right), the provision of a distributed catalogue, replication support, failure tolerance, orientation to Grid computing, replica coherence, deterministic discovery of files and heuristics for providing intelligent storage. The main lacks of current systems are: Locality of storage: replicas are far from the consumer application. So they invest time in network transferences. Centralization: a central server manages the entire catalogue of files. Partial functionality: storage systems are implemented for a specific purpose and so it is difficult to adapt to multipurpose systems. The files in the system are stored without a clear heuristic model.
3 Description of DFSgc This section describes the DFSgc (Distributed File System for Grid and Cloud computing) architecture. The system is composed by three layers, the next points shows a brief description of them. Interface Layer: specifies a simple API which is used by the applications to execute file operations like a local storage system. This layer allows hiding the physical location of the files, so the applications make reference to a global namespace when they need read/write a file. Logic Layer: defines an intelligent model to reply files near of the Grid applications. A complete distributed catalogue maintains separated the metadata information and the physical location of the files. Storage Layer: this level allows abstracting the protocol to transfer files (ftp, http, Gridftp, etc.). Also storage layer hides the end way to store the files, so it’s possible to use a remote storage server or a simple local storage system. 3.1
Interface Level
The Grid applications need to manage files like a local storage system so the interface level enables a simple API to read, write and delete files from a common catalogue. DFSgc defines an API composed by three basic operations: get (download a file), put (upload a file) and delete. To access to the DFSgc system, an application operates directly with a known node, so the internal structure of the system is hidden to the applications and users. A global namespace is used to refer to the files in the catalog system. It’s possible to choose several ways, for instance: dom.subdom.file, /dir/subdir/file, ZXCV123, etc.
The put function enables the user not only to upload a file but also to create a specified number of replicas. So the used decides as much replicas as needed but the DFSgc replication system establishes where to effectively store them according to several policies. 3.2
Storage Level
The storage level abstracts the storage infrastructure so the files can be stored in local file system of the node or it can be delegated to remote servers. In the second case the DFSgc node acts as a front-end, and the files are physically stored in a specific storage element. In order to make the transferences the grid applications can select a suitable transference protocol to put the files in the system. It’s possible to use several standard transference protocols (ftp, http, Gridftp) but DFSgc enables a plug-in-like model, so that is very simple to add new protocols. 3.3
Logic Level
This layer defines the main modules in the DFSgc architecture. The Fig. 1 shows the interaction scheme; the functionality of each module is commented in the next points. Server: waits requests files and coordinates the rest of modules. Replication system: decides the number and the location of copies attending to the heuristic model. Statistics: collects historic information about files to use it in replication system. Catalog system: maintains metadata information for each file. Discovery system: enables to find the node that stores the metadata information of each file.
Fig. 1: layer architecture for DFSgc.
3.3.1 Replication System The analyzed distributed file systems invest a lot of time transferring the file from distant locations when a grid application needs to access the files. In order to solve this, DFSgc proposes a replication file system that takes files near of the applications. The applications can specify the minimum number of required copies of their uploaded files, but is the system the responsible of selecting the effective storage node to send the replicas. The replica system uses the information provided by the statistic
module in order to select the most suitable nodes to store the files. The replica system can create more replicas than selected by the file owner in order to improve the performance, moving the files closer to the nodes that use them. Maintaining different replicas of the same file could provoke appearing incoherent copies. To solve this, DFSgc applies cache coherence techniques to ensure the integrity of all the replicas. A distributed file system can be viewed as a multiprocessor system where the file system is considered as a global memory, which is composed by the aggregation of partial file systems. Then, a processor is similar to a node; a cache is the distributed storage; blocks of memory are files and the main memory is similar to the global file catalogue. There are several contrasted approaches to solve the incoherence of cache, so that DFSgc provides a directory-based coherence model to ensure that the file replicas are always updated. Basically the coherence replication works invalidating copies when a file is modified. The write operations mark as invalid all the replicas of the file, except the written one. Also, the system avoids unnecessary invalidations and transferences when an application writes the same replica repeatedly. If an application tries to read an invalidated replica, the system finds the correct copy of the file and sends it to the requestor node. Read operations from nodes where the requested file doesn’t have a replica, implies the creation of a new replica in that node in order to improve the next read operations of that file in the same node. The Grid enables the applications to access concurrently to different replicas of the same file. To avoid inconsistencies accessing to the files some blocking procedures are needed. DFSgc establishes a write blocking procedure giving more priority to the read operations over the write ones. When a write operation starts the system blocks the write operations in all replicas of the written file, enabling the read operations on them. When the write operation is successfully completed the current active reading operations are cancelled. Read operations are available in any other interval of time (Fig. 2 shows the complete operation of blocking procedure).
Fig. 2: Concurrent access to files.
3.3.2 Catalogue and Discovery System Each node in DFSgc manages metadata information about a partial catalogue of files, so every file in the system is associated to an owner node which controls the physic location of the replicas. So when a server in DFSgc wants to retrieve a file then is needed to resolve two questions: Does anyone know anything about the file? ; So it’s necessary a discovery system to find the owner file. Where are the replicas of the file? ; The owner node manages the catalogue of the file and so is possible to know the physic location of every replica. The discovery and catalogue systems are completely independent of the storage system, the replicas of a file are stored in suitable storage resources and its metadata information is retrieved from separated locations in the system. The next sections show a brief description about the catalogue and discovery system provided by DFSgc. 3.3.2.1 Discovery System To discover the owner node of the files DFSgc provides a deterministic search model which always ensures to search the owner of a file if it exists in the system. This design applies a peer to peer architecture to resolve the deterministic search. There are currently several p2p designs and implementations of distributed search systems (Pastry [6], Chord [7], Tapestry [8], etc.) therefore DFSgc takes advantage of the ring topology provided by the Pastry architecture (enables deterministic search and stability). Basically the discovery system may be viewed as a great ring where each node has a unique identifier. This identifier is obtained using a hash function to the IP, MAC or DNS name of the node. The nodes are ordered in the ring using their identifiers and every node in the ring knows its closer neighbors. Similar to the nodes all the files have a unique identifier obtained using a hash function with the name of the file. So every file belongs to the node with the nearest identifier, and then always is possible to find the owner node of a file. 3.3.2.2 Catalogue System The global catalogue is composed by small partial catalogues scattered throughout all nodes of the storage file system. The main function is to store metadata about managed files (replica location or the table of valid stored files). Also it manages extra data to maintaining coherence of files. Any owner node makes synchronous backups copies of neighbors catalogues to increase failure tolerance. In order to avoid network failures, a hash function enables that contiguous nodes are not necessarily in the same subnet. 3.3.3 Statistics Module The statistics module stores historic information about access to the files: number and type of operations, type of applications that access, replica locations stored along the time, etc. The replica system interacts directly with the statistics module, and then a heuristic function decides the suitable location for a specific file.
4 Use Case The example shown in the Fig. 3 describes a use case for a write operation. In the step 1 a Grid application requires writing a file (File1) in the storage system, so the application communicates with the nearest known node of DFSgc and invokes the PUT operation. The application stays waiting while the node A internally initializes the write procedure. At step 2, the node A uses the discovery infrastructure to find the owner node (node B) of File1. The process ends when the node B respond to the node A with its identifier (step 3), then the node A known that the owner node for File1 is the node B and it must request the PUT operation of File 1 in the next step. Node A communicates with the node B and request it for a write operation of File1, then node B blocks the write operations (step 4) while node A puts one replica of File1 (step 5) in a storage node of the system (in this case node D). While node A is writing File1 some other node may read a replica of this same file, but they can’t write File1. Finally, node A terminates writing File1 successfully, so the owner node of File1 invalidates all the old copies of File1 and it unblocks the write operation (step 6).
Fig. 3: Use Case for a write operation.
5 Conclusions and Future Works This paper introduces the design of a distributed file system name DFSgc whose main purpose is to provide a virtual and intelligent storage for multipurpose Grid applications and ubiquitous systems. Previous works have been analyzed finding two main lacks in most of them: a central server manages the entire catalogue of files, and the files in the system are stored without a clear heuristic model.
DFSgc employs a Grid catalog to provide transparent access to the applications. The robust discovery system based in peer to peer architectures provides high fault tolerance. Also, a replication model introduces heuristic techniques to bringing files near to the applications. There is a first implementation but it’s necessary to work in some other aspects: News heuristic models for the replication system to improve the performance of the transmissions and the robustness of the system to failures of some storage nodes. Security modules at two levels: first in the storage and transmission levels, and in a higher level to enable the applications to access the system.
References 1. Jean-Philippe Baud, James Casey, Sophie Lemaitre, Caitriana Nicholson, Graeme Stewart. LCG Data Management: From EDG to EGEE. CERN, European Organisation for Nuclear Research, 1211 Geneva, Switzerland. 2. LCG File Catalog administrators' guide. https://twiki.cern.ch/twiki/bin/view/LCG/ LfcAdminGuide. 3. A. Chervenak, et. al, Giggle: A Framework for Constructing Scalable Replica Location Services, Proc. of SC2002 Conf. 2002. 4. The Apache Hadoop project. http://hadoop.apache.org/core/. 5. Microsoft Distributed File System. http://technet.microsoft.com/en-us/library/ cc738688.aspx. 6. A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), pp. 329-350. 2001 7. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking. pp. 17-32. 2003 8. Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph,and John D. Kubiatowicz. Tapestry: A Resilient Global-Scale Overlay for Service Deployment. IEEE Journal On Selected Areas in Communications, vol. 22, nº. 1, 2004.