LNCS 3251 - Design and Implementation of Grid File ... - Springer Link

4 downloads 411 Views 385KB Size Report
GASS or any other file transfer protocol compatible with hotfile structure into a unified vegafile protocol. ... to multi servers. Files storage in multi server of one user consist his unified storage .... private byte fileType; private String physicalpath;.
Design and Implementation of Grid File Management System Hotfile Liqiang Cao1,2, Jie Qiu3, Li Zha2, Haiyan Yu2, Wei Li2, and Yuzhong Sun2 1

Graduate School of the Chinese Academy of Sciences, Beijing 100080, China [email protected] 2 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China [email protected] {Yuhaiyan,liwei,yuzhongsun}@ict.ac.cn 3 IBM China Research Laboratory, Beijing, China [email protected]

Abstract. Hotfile is a user level file management system. It wraps GridFTP and GASS or any other file transfer protocol compatible with hotfile structure into a unified vegafile protocol. Based on virtual grid file layer and a set of basic grid file operations, users can access grid file without knowing the physical transport protocol of the file. Test result shows the overhead of vegafile protocol is little in file transfer and operation. Further vegafile transfer experiment shows when file size is smaller than 1M byte, Vega file copy with GASS has higher bandwidth, where as file is larger than 1M byte, Vega file copy with GridFTP has higher bandwidth.

1 Introduction Traditionally file system runs in the kernel space of OS. It manages storage device in computers and network, manages user’s data in files and supplies user with a POSIX compatible interface. However, grid environment is not like the single machine or computer network; it faces resource sharing and resource co-operations problems in dynamic environment and between multi virtual organizations [1]. To solve file transfer and file management problems in grid environments, many grid file systems and grid file management tools has been developed. The main obstacle in developing grid application is absence of standards file operation interface, Facing different grid file systems and different grid file management tools, we developed a vegafile transfer protocol and implemented it hotfile, which can utilize different file management protocol and give user a unified operation interface. Hotfile wrapped GridFTP and GASS or any other file transfer protocol compatible with hotfile structure. Vegafile layer in hotfile abstracts the physical address and physical transfer protocol of grid file. Composed by a set of basic grid file operations, hotfile’s higher level APIs are easy to use and manage grid files. Hotfile are different from other grid file management tools in that hotfile server works as a grid services and it can dynamically deployed to the computers that has installed globus toolkit and it can work with most popular file transfer protocol such as GridFTP and GASS. H. Jin, Y. Pan, N. Xiao, and J. Sun (Eds.): GCC 2004, LNCS 3251, pp. 129–136, 2004. © Springer-Verlag Berlin Heidelberg 2004

130

Liqiang Cao et al.

The rest of this paper is organized as follows. We first describe the structure of hotfile, and then discussed vegafile layer and basic grid file operation set in detail. We also discuss some related works. Finally we show the current status of hotfile and the future of hotfile.

2 Structure of Hotfile Hotfile has 2 layered naming space. First layer of hotfile naming space is the name of Data Service, which is registered in router service of Vega grid or MDS of globus. The second layer of naming space is file’s path in a Data Service. A data service contains one or more file or directory in it. Hotfile is mostly like a client/server structure, but different from it in that most c/s architecture has single application server. Hotfile server can be dynamically deployed to multi servers. Files storage in multi server of one user consist his unified storage space. Fig. 1 is the logical structure of hotfile. To single users, Hotfile is a grid file server based on his grid certificate/credential. User can access data services in any host which are trusted by CA in grid environments.

Fig. 1. Structure of hotfile.

User or application can access his data via hotfile’s GUI or API. To get data service, it should has the certification which over trusted CA of hotfile server’s signature. Data services that trusted by user work as part of user’s storage space. All data services trusted by user compose his unified storage space. Vegafile layer is a virtual file operation layer of hotfile. It binds Vegafile protocol to physical file address and physical file access protocol. In conjunction with discovery mechanism supported by VEGA grid or globus toolkit, vegafile layer firstly get the physical address of Data Services, and then it access physical address of data service to get physical file access protocol via PortType of Data Service. Once physical address and physical file access protocol has been obtained, physical file management channel can be set up. Because of different file transfer protocol has different file access interface. To have a unified file access protocol, we wrapped all data operation of GridFTP and GASS to a single Vega file operation interface, with which users can access file in a single storage space.

Design and Implementation of Grid File Management System Hotfile

131

Data Services is the server side of hotfile. It can be deployed to servers installed gt3. It not only wrapper different file operation protocols, but also have PortType interface. PortType is Data service’s supplement interface that management some features of Data Services, such as the physical access protocol and the base directory of Data Services, and, from PortType, client can get some services data of Data Service. With the interface of vegafile layer, we implement an advanced grid file operation API and a graphics grid file manager.

3 Vegafile Protocol Vegafile protocol is the core of hotfile. Files in hotfile are represented by vegafile URL, data and Meta data in file are amended by basic hotfile operations. Hotfile has a weaker semantic compared with UNIX file systems in file lock operation. Multi-users in different client can’t write to file at same time. There is two reasons why we have a weaker semantic. The first is, vegafile protocol designed to be a file transfer and management protocol, not a data I/O protocol, so there is very few concurrent write to the same files. The second is, strong semantic means more communications between client and server. Because of high latency in WAN, stronger semantic spent much time in communications, which means lower performance. Vegafile URL is a virtual file address in hotfile system. The BNF of vegafile URL is defined as follows: vegafile url := scheme "://" [hostport] "/" [path] scheme := vegafile hostport := hostport from Section 5 of RFC 1738 [5] path := *path | ( / *([a .. z] | [A .. Z] | - | _ | . ) ) When hotfile work, Data Services of hotfile firstly search Resource Router in Vega grid or MDS of Globus to get all available data services of user. With the corporation of data service deployed in server, vegafile dynamically translated to physically address of physical file. Vegafile layer firstly get the host physical address and grid services container port from MDS of globus toolkit 3 or resource router developed by our Vega Grid team, and, suppose get a Data service in 159.226.41.23:8080/OGSA/GOS/Data/DataService, then, it will utilize getURLpoint and getLocalPoint function in PortType of 159.226.41.23:8080/OGSA/GOS/Data/DataService to get the physical protocol of file transfer and the local configuration of the server, which can be used to build physical file transfer path between server and client. Once physical path was setup, Vegafile layer can operate on files using the wrapper API of physical protocol. Though the MDS or router is a component of hotfile, it’s only used when file system initialized. As part of globus toolkit and Vega Grid, it also a necessary part of other grid systems. There are two principals in deciding which operation should and how to implement. First principal is the minimum principal, file operations can’t be disassembled, and they can be used to implement other operations with few client side calculation. The second principal is the balance principal. Because of the high latency in hotfile

132

Liqiang Cao et al.

Fig. 2. Steps of vegafile layer.

environment, the main bottleneck in hotfile is communication, exist operation and list operation can be returned in almost the same time, so we choice list operation as one of basic set and implement exist operation and getFileInfo operation on the basis of list operation. With above principals, the operations in basic file operation set are: copy, initialize, open, close, delete, mkdir, and list. GetFileInfo operation returns vegafile class, which composed mainly by physical attributes of Vegafile. Part of fields in vegafile class is follows: private long size; private String name; private String date; private byte fileType; private String physicalpath; With the data service and virtual vegafile layer, hotfile wrappers the current most popularly used grid file management protocol. Fig.3 shows the Hotfile’s graphic user interface. Tool bar and address lists placed in upper place of GUI, Left panel and right panel of GUI represents browser of one data service. Users can copy, move, create, delete file or directories between 2 data service.

4 Performance We deployed Data services in Computer with 2 xeon 2.4G CPUs, 1GB memory and 1 36GB utltra3 SCSI hard disk, OS is red hat Linux 7.3, and JVM version is 1.4.2. Client is a personal computer with PIII 1G CPU, 512M memory and 60GB ATA-100 hard disk; OS is windows XP; JVM version is 1.4.1. Server and Client are connected with fast Ethernet. 4.1 The Overhead of Vegafile Layer We test the performance of Vegafile protocol with GridFTP and Vegafile protocol with GASS. Fig 4 and Fig 5 are result of vegafile copy compared with GridFTP upload and GASS copy respectively.

Design and Implementation of Grid File Management System Hotfile

133

Fig. 3. GUI of hotfile.

Fig. 4. Performance of Vega copy and GridFTP upload.

After protocol has been initialized, overhead in vegafile only generates in protocol transfer from vegafile protocol to physical protocol. Result shows that the overhead of vegafile protocol is minimum compared with file transfer time. Fig 4 is the performance of coping files from 1k to 256M using Vega file copy with GridFTP; and we also upload the same set of file using GridFTP. The average overhead of Vegafile copy with GridFTP is 0.239 seconds. In a similar performance test, file varies from 1K to 64M and we compared the performance of Vega copy with GASS and GASS copy, the average overhead of Vega copy with GASS is 0.242 seconds. 4.2 Performance of Vegafile with GridFTP and GASS Because of certification based mechanism, Data transfer in grid environment is time cost. 1K file copy using Vega copy with GridFTP copy is 4.715 seconds, 1K file copy using Vega copy with GASS is 3.475 seconds.

134

Liqiang Cao et al.

Fig. 5. Performance of Vega copy and GASS copy.

Fig. 6. Performance of Vega copy with GridFTP and GASS.

Fig6 shows the performance of Vega copy with GridFTP and Vega copy with GASS. Vega copy 1M size file with GridFTP between client and server need 4.411 seconds; it need 35.164 seconds when copying 256M file. Vega copy with GASS 1M size file between client and server need 4.390s; it need 297 seconds when copy 256M file. When file size is smaller than 1M, GASS has better performance, when file size is larger than 1M, GridFTP has better performance. Table 1. Metadata performance of Vega File operation. Vega protocol with GridFTP with GASS

Initialize 10.946 9.133

Open 3.841 0.000

Close 0.044 0.000

Mkdir 0.045 0.792

Delete 0.046 0.803

List 100 files 0.536 1.026

GASS transfer data with https protocol. There are 2 overheads of GASS copy, one is encrypted data and decrypted data before and after transfer, the other delay generates from changing file from 8 bit encoding to Base64 encoding.

Design and Implementation of Grid File Management System Hotfile

135

Fig. 7. Time of list operation in Vega list with GridFTP.

Table1 is the Vega file metadata operation performance. Because of communications between client and server, initialize operation need 10.946 seconds with GridFTP and 9.133 seconds with GASS separately. Vega Open with GridFTP needs to authorize user’s certification, and it needs 3.841 seconds, however, Vega Open with GASS do nothing in fact, so time in open is zero. Because of GASS is a stateless protocol, all its operations need to be authorized, so the Vega operation with GASS is longer than Vega operation with GridFTP. Fig7 shows the time of list directory with 100 to 100,000 files using Vega list with GridFTP. We find there is minimum difference in list directory with different number of files.

5 Related Works There are many efforts on implementation a virtual data interface in grid, Such as SRB, IBP and OceanStore. SDSC Storage Resource Broker (SRB) [3] is a client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network. In conjunction with the Metadata Catalog, SRB provides a way to access data sets and resources based on their high level file names rather than their physical locations. Internet Backplane Protocol (IBP) and the exNode [7, 8] are tools that have been designed to create a network storage substrate that adheres to end-to-end design principles which was formulated in the context of IP network architecture. Logistical Networking is a model of networking that exposes the fact that data is buffered and allowed that to be used to implement novel communication strategies. OceanStore [9] is a peer-to-peer storage infrastructure. It is a global persistent data store designed to scale to billions of users. With Byzantine-fault tolerant commit protocol, it provides a consistent, high-available, and durable storage utility on an infrastructure comprised of untrustworthy servers.

6 Conclusion and Future Work Hotfile provides a novel architecture of user level grid file systems with low overhead and high usability. It has a 2 layered unified Vega file naming space in distributed

136

Liqiang Cao et al.

environment. Hotfile server side is implemented as a service, which can be deployed dynamically to gt3 servers. In client side’s vegafile layer of hotfile, we wrapped 2 widely used grid file transfer protocol to vegafile protocol with little overhead. There are still many issues to be further worked in hotfile. First is the client side cache, which can short request and answer path then accelerate hotfile operation, the second is grid file’s authentication question, which is time cost now. To solve it with higher performance and same security level is quite a challenge problem.

References 1. I.Foster, C. Kesselman, J. Nick, S. Tuecke; The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, Open Grid Service Infrastructure WG, Global Grid Forum, (2002) 2. Z.Xu A Model of Grid Address Space with Applications, VGSD- 2, (2002) 3. R. Moore, A. Rajasekar, and M. Wan. The SDSC Storage Resource Broker. In Procs. Of CASCON'98, In Toronto, Canada, (1998) 4. The POSIX standards: http://www.opengroup.org/onlinepubs/007904975/toc.htm. 5. Berners-Lee, T., Masinter, L. and M. McCahill, “Uniform Resource Locators (URL),” RFC 1738 prop, (1994) 6. B. White, A. Grimshaw, and A. Nguyen-Tuong, “Grid-based File Access: the Legion I/O Model”, in Proc. 9th IEEE Int.Symp. on High Performance Distributed Computing (HPDC), pp165-173, (2000) 7. Micah Beck, Terry Moore, James S. Plank. An End-to-End Approach to Globally Scalable Network Storage. ACM SIGCOMM 2002, Pittsburgh, PA, (2002) 8. Rebecca L. Collins and James S. Plank, Content-Addressable IBP -- Rationale, Design and Performance, ITCC 2004, Las Vegas, (2004) 9. J.Kubiatowicz et al., “OceanStore: An Architecture for Global-Scale Persistent Storage”, Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), (2000) 10. B. White, A. Grimshaw, and A. Nguyen-Tuong, “Grid-based File Access: the Legion I/O Model”, in Proc. 9th IEEE Int.Symp. on High Performance Distributed Computing (HPDC), pp165-173, (2000)

Suggest Documents