Network Storage Management in Data Grid Environment

0 downloads 0 Views 1MB Size Report
uted storage system, which can be utilized in the Grid environment. NSM architecture ... Data intensive applications, such as experimental analysis, simulations and visualizations ... To enable dependable service, NSM uses coding to add.
Proceedings of the 2nd Annual International Workshop on Grid and Cooperative Computing (GCC 2003), pp. 879-886, Shanghai, China, December, 2003

Network Storage Management in Data Grid Environment Shaofeng Yang, Zeyad Ali, Houssain Kettani, Vinti Verma and Qutaibah Malluhi Department of Computer Science Jackson State University Jackson, MS 39217 [email protected] {zeyad.f.ali, houssain.kettani, vinti.verma, qmalluhi}@jsums.edu

Abstract. This paper presents the Network Storage Manager (NSM) developed in the Distributed Computing Laboratory at Jackson State University. NSM is designed as a Java-based, high-performance, distributed storage system, which can be utilized in the Grid environment. NSM architecture presents a framework offering parallelism, scalability, crash recovery, and portability for data-intensive distributed applications. Unlike several parallel research efforts, this paper introduces an architecture that is independent of systems and protocols. Therefore, the system can run in a typical heterogeneous Grid environment. We illustrates how NSM incorporates Grid-FTP and GSI authentication. We also provide a brief evaluation of the system performance.

1

Introduction

Nowadays, Grid technologies [9, 10] are getting more popular and have been applied to various computational fields. The Grid infrastructure can support the sharing and coordinated use of diverse resources in dynamic and distributed virtual organizations [10]. Data intensive applications, such as experimental analysis, simulations and visualizations, require high-rate data access to huge data sets. Moreover, since many Grid applications deal with remote data and remote users that are often geographically distributed, a major challenge for building the computational Grid is providing an efficient distributed data storage environment. A number of research projects in the scientific field have targeted enhancing the performance, security, scalability, and reliability of data intensive distributed Grid applications. Some solutions have focused on tuning network parameters, such as setting the correct TCP buffers and using parallel streams to optimize the performance [7]. However, this approach requires the implementation of a Linux kernel-specific tuning daemon. Other efforts are designing new TCP stacks, which are operating system-related and nonstandard [12, 5]. The

traditional operating systems centered solutions limit the control and management of storage resources to the kernel. Applications are limited to the policies and implementations provided by the system. The problem with this approach is that applications have different requirements. Therefore, storage policies suitable for one application may lead to poor performance and behavior for others. To avoid these limitations, two distributed storage systems are also being developed: Armada parallel file system [18] and Storage Resource Broker (SRB) [4]. But Armada does not handle crash recovery and SRB is not application-controlled and its simple backup mechanism is not cost-effective. Thus, in order to handle the high-rate of data access requirements for data intensive applications in the heterogeneous Grid environment, a general solution with the following features is desired: system-independent and applicationcontrolled; high performance; cost-effective data recovery; integration with core Grid services. The Network Storage Manager (NSM), a Java-based software system, has been developed for this purpose in the Distributed Computing Laboratory at Jackson State University. This paper discusses how NSM is designed and developed to meet all the requirements above. The rest of this paper is organized as follows: Introduction about NSM architecture and application-controlled feature is discussed in Section 2. In Section 3, we show how does NSM utilize GridFTP and GSI for the Grid environment. In Section 4, we evaluate system performance regarding parallel TCP streams, FTP, and GridFTP. Finally, a summary and concluding remarks are presented in Section 5.

2

NSM System Architecture

NSM has a unique architecture that provides many advantages, including high performance, reliability, self-healing, load balancing and seamless access. NSM utilizes multiple parallel data streams to achieve load balancing and high data rates. NSM delivers reliable storage by encoding redundant blocks of data and distributing the generated redundant data, which gives the system the ability to restore any missing or long-delayed data and to heal any damaged or corrupted data sets automatically. NSM approach is much more cost-effective than replication because its encoded redundant data blocks are much smaller in size than that of the original data. Therefore, applications utilizing NSM for their data storage are smart since they automatically inherit all of its merits and features. 2.1

Data Layout over storage servers

As illustrated in Figure 1, NSM partitions a data set into a number of small data blocks. The partitioning algorithm may be a standard fixed-size algorithm or an application provided algorithm. The system distributes the blocks across multiple data servers. To enable dependable service, NSM uses coding to add redundancy to the original data. This redundancy enables applications to retrieve the original data even if a portion of the data is unavailable due to server and/or

network failure. Data blocks and their corresponding parity blocks are grouped in married blocks. A married block contains one block for each data and parity server. Selecting the blocks in each married block is an application issue and depends on its decision on the suitable data layout. To ensure load balancing, NSM distributes the blocks of a married block to distinct servers.

Fig. 1. Data Set Layout Over Storage Servers.

2.2

Distributing Data Sets

In Figure 2, we demonstrate how married blocks are buffered for uploading using NSMWriter. Efficient data service can then be achieved by using multiple concurrent streams established between the client and the distributed data servers. After uploading all the blocks of the data source, meta data, which contains the information describing the dataset and its distribution configurations, is obtained from the layout algorithm and uploaded to one or more designated meta servers. 2.3

Data Retrieval

The system offers application transparent and seamless access to the physically distributed data sets. Applications can use NSM as a high-performance random input stream. An application can open multiple data sets at a time using the same NSMReader. Each data set has its own buffer. A prefetching mechanism is utilized in an effort to have the most likely to be requested blocks in memory even before they are requested by the application. Application requests have higher priority over prefetching requests. A request to a single block results in requesting all the blocks in the corresponding married block. The requests are queued and served according to their priority.

Fig. 2. Distributing Data Sets Using NSMWriter.

If data set buffer is full and more requests are coming, a cache management algorithm is used to decide which blocks to dispose. The standard cache replacement policy will dispose the least recently used blocks. As shown in Figure 3, the blocks are downloaded in parallel from their storage servers using asynchronous system calls. The system recovers any server failure or network delay by transparently switching to any of the available parity servers. Missing data blocks are reconstructed by decoding the corresponding parity blocks. On-the-fly data recovery leads to a high reliability without sacrificing the performance. NSM is also an application-controlled framework. Data layout model, partitioning algorithm, prefetching algorithm, cache replacement policies, and meta data are fully controlled by the application. NSM allows developers to specify or plug in their own data transfer protocols and authentication mechanisms. For example, users or developers can use the built-in FTP and HTTP or their own or other customized protocols for traditional distributed systems. GridFTP is provided and supported in the NSM for Grid systems. Two sample applications utilizing NSM features have been built on the top of NSM: one of these can display a terrain image by reading image tiles as needed from distributed servers [16]. The other one is video player client application which can play frames from different parallel remote servers and provide frame reconstruction and frame skipping features [17]. Generally speaking, the pure java implementation and data transfer protocol independency provide NSM portability and platform independence. The parallel streaming, load balancing, and buffering provide high performance for high rate

Fig. 3. Handling applications requests by NSMReader.

of data access. Encoding and decoding schema make NSM cost-effective for data recovery. In the next section, we address how NSM integrates with Grid services.

3

NSM in the Grid Environment

To build the Grid environment, some commercial solutions are available. In addition, Globus is considered the most widely utilized open-source toolkit for building grid applications [10]. The Globus Toolkit is fully compatible with the Open Grid Services Architecture (OGSA), which is the standard that defines the Grid service and its related mechanisms, protocol bindings, and integration with native platform facilities [10]. Therefore, the Globus ToolKit 2.0 was selected as the platform for developing and testing our Grid-enabled NSM. Globus data management architecture, one of the fundamental components of Globus, provides GridFTP service for Grid computing environments. Therefore, GridFTP is a basic Grid protocol for transferring data between Grid nodes. GridFTP extends FTP with new features and provides several advantages over FTP [1, 2, 3], such as Grid Security Infrastructure (GSI) authentication [11, 8, 6], a standard and secure authentication mechanism in the Grid environment, thirdparty control, striping, and partial file access [3]. For NSM to run in the grid environment, it has to support this standard Grid data transfer protocol. NSM modular and programmable architecture permitted us to implement a GridFTP module as one of the application-controlled NSM

Plug-ins. Implementing and utilizing a GridFTP pluggable module was the first important step required for running NSM in the grid environment. Our implementation took advantage of Java CoG Kit [14, 15]. The latter is based on Globus Java API and provides a GridFTP client as well as mappings to commonly used Grid services including GSI and LDAP. Thus, we utilized Java CoG to implement the data transfer protocol interface of NSM. In addition to GriFTP, NSM is currently capable of utilizing FTP, HTTP and NSM-specific data transfer protocols.

The NSM pluggable architecture supports GSI authentication. GSI authentication implemented on top of the Generic Security Service application program interface (GSS-API) that provides authentication and authorization services using public key certificates as well as Kerberos authentication [11, 8, 6]. GSI is also a fundamental component of Globus and has been bound to Grid services as a standard authentication mechanism in the Grid environment. Since GSI authentication in Java CoG Kits is not compatible with GSS-API and GSS in Sun Java JSDK 1.4 is not yet pluggable [13]. NSM adopted Java Authentication and Authorization Service (JAAS) in Sun Java JSDK 1.4 as its authentication framework. JAAS is designed to provide a general and standard authentication and authorization framework as well as a programming interface [13]. The JAAS authentication framework implements a Java version of the Pluggable Authentication Module (PAM). Thus, it allows users and developers to plug in their own unique authentication mechanism. In our case, this mechanism is GSI authentication. GSI authentication is implemented with a JAAS interface so that GSI authentication in NSM will work with other Grid services and protocols. As a result, future application developers using NSM as a network storage layer can design, implement, and plug their own authentication mechanisms into the system. Meanwhile, NSM authentication infrastructure adds Sun Java GSS under JAAS for supporting Kerberos authentication. Figure 4 illustrates NSM authentication framework. The username and password module of JAAS works well with traditional data transfer protocols like FTP and HTTP. GSI authentication module is protocol-independent and currently works only with GridFTP. Kerberos can also be applied to a specific protocol when demanded.

4

Summary and Concluding Remarks

This paper shares the experiences gained from building a Grid enabled storage system. This paper also presents a flexible and platform-independent distributed data storage system architecture utilizing GridFTP and GSI authentication with high performance and reliability. Hence, this setup is suitable for data intensive Grid applications. By utilizing the NSM storage system, applications can tune

Fig. 4. NSM Authentication Framework.

their performance by selecting and implementing storage policies that are appropriate for their specific requirements. We also performed experiments regarding the performance of NSM employing GridFTP versus FTP. Such experiments indicated that as the number of parallel remote data servers increases; the time to get the sample data file decreases. Although GridFTP has higher overhead as compared to FTP, running NSM in Grid environment can still dramatically improve the performance of data intensive applications in the Grid environment, not to mention other advantages of NSM, such as reliability, self-healing, etc.

References 1. B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel and S. Tuecke (2001). “ Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing,” Proceedings of the 18th Annual IEEE Symposium on Mass Storage Systems (MSS 2001), San Diego, California, April, 2001. 2. W. Allcock, A. Chervenak, I. Foster, C. Kesselman, C. Salisbury and S. Tuecke (2001). “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets”, Journal of Network and Computer Applications, 23:187-200, 2001. 3. W. Allcock, J. Bresnahan, I. Foster, L. Liming, J. Link and P. Plaszczac (2002). “GridFTP Update January 2002”, Technical Report. 4. C. Baru, R. Moore, A. Rajasekar and M. Wan (1998) “ The SDSC Storage Resource Broker”, Proceedings of 8th Annual IBM Centers for Advanced Studies Conference

(CASCON 1998), December, 1998, Toronto, Canada. 5. J. J. Bunn, J. C. Doyle, S. H. Low, H. B. Newman and S. M. Yip (2002). “Ultrascale Network Protocols for Computing and Science in the 21st Century”, White paper to US Department of Energy’s Ultrascale Simulation for Science (USS)initiative, 6. R. Butler, D. Engert, I. Foster, C. Kesselman, S.Tuecke, J. Volmer and V. Welch (2000) “A National-Scale Authentication Infrastructure”, IEEE Computer.33(12):60-66, 2000 7. T. Dunigan, M. Mathis and B. Tierney (2002). “A TCP Tuning Daemon”, Proceedings of the 14th Annual Supercomputing Conference (SC2002), Baltimore, Maryland, November, 2002. 8. I. Foster, N. T. Karonis, C. Kesselman and S. Tuecke (1998). “Managing Security in High-Performance Distributed Computing”, Cluster Computing, 1(1):95-107, 1998. 9. I. Foster, C. Kesselman, J. Nick and S. Tuecke (2002). “Grid Services for Distributed System Integration,”Computer, 35(6). 10. I. Foster, C. Kesselman, J. Nick and S. Tuecke (2002) “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration,” Proceedings of the 5th Global Grid Forum Workshop (GGF5), Edinburgh, Scotland, July, 2002. 11. I. Foster, C. Kesselman, G. Tsudik and S. Tuecke (1998). “A Security Architecture for Computational Grid”, Proceedings of the 5th ACM Conference on Computer and Communications Security Conference, San Francisco, California, November, 1998. 12. S. Floyd (2002). “HighSpeed TCP for Large Congestion Windows”, Internet Engineering task Force, June, 2002. 13. C. Lai, L. Gong, L. Koved, A. Nadalin and R. Schemers (1999). “User Authentication and Authorization in The JAVA(TM) Platform”, Proceedings of the 15th Annual Computer Security Applications Conference, Phoenix, Arizona, December, 1999. 14. G. V. Laszewski, I. Foster, J. Gawor, W. Smith and S. Tuecke (2000). CoG Kits, “A Bridge between Commodity Distributed Computing and High-Performance Grids”, Proceedings of ACM Java Grande 2000 Conference, 97-106, San Francisco, California, June, 2000. 15. G. V. Laszewski, I. Foster, J. Gawor and P. Lane (2001). Concurrency, “A Java Commodity Grid Toolkit”, Practice and Experience , 13, 2001. 16. Q. Malluhi and Z. Ali (2002). “DTViewer: A High Performance Distributed Terrain Image Viewer with Reliable Data Delivery”, the 2nd Annual International Workshop on Intelligent Multimedia Computing and Networking (IMMCN 2002), 927-930, Durham, North Carolina, March, 2002. 17. Q. Malluhi and O. Aldaoud (2002) “VoD System Using a Network Storage Manager”, Proceedings of the 8th Annual International Conference on Distributed Multimedia Systems (DMS 2002), San Francisco, California, September, 2002. 18. R. Oldfield and D. Kotz (2001). “Armada: A Parallel File System for the Computational Grid”, Proceedings of the 1st Annual IEEE International Symposium on Cluster Computing and the Grid (CCGrid2001), Brisbane, Australia, May, 2001.

This article was processed using the LATEX macro package with LLNCS style