A Parallel and Fault Tolerant File System Based on NFS Servers F. Garc´ıa, A. Calder´on, J. Carretero, J. M. P´erez, and J. Fern´andez Computer Architecture Group Computer Science Department, Universidad Carlos III de Madrid Legan´es, Madrid, Spain. E-mail:
[email protected] Abstract One important piece of system software for clusters is the parallel file system. All current parallel file systems and parallel I/O libraries for clusters do not use standard servers, thus it is very difficult to use these systems in heterogeneous environments. However, why use proprietary or specialpurpose servers on the server end of a parallel filesystem when you have most of the necessary functionality in NFS servers already? This paper1 describes the fault tolerance implemented in Expand (Expandable Parallel File System), a parallel file system based on NFS servers. Expand allows the transparent use of multiple NFS servers as a single file system, providing a single name space. The different NFS servers are combined to create a distributed partition where files are stripped. Expand requires no changes to the NFS server and uses RPC operations to provide parallel access to the same file. Expand is also independent of the clients, because all operations are implemented using RPC and NFS protocol. Using this system, we can join heterogeneous servers (Linux, Solaris, Windows 2000, etc.) to provide a parallel and distributed partition. Fault tolerance is achieved using RAID techniques applied to parallel files. The paper describes the design of Expand and the evaluation of a prototype of Expand, using MPI-IO interface. This evaluation has been made in Linux clusters and compares Expand with PVFS. Keywords: Parallel File System, NFS, data stripping, clusters, RAID.
1. Introduction Traditional network and distributed file systems support a global and persistent name space that allow multi1 This work has been partially support by the Spanish Ministry of Science and Technology under the TIC2000-0469 and TIC2000-0472 contracts
ple clients to share the same storage devices, but does not provide parallel access to data, becoming the file servers in a major bottleneck in the system. The use of parallelism in file systems alleviates the growing disparity in computational and I/O capability of the parallel and distributed architectures. Parallelism in file systems is obtained using several independent servers supporting one o more secondary storage devices. Data are stripped among these nodes and devices to allow parallel access to different files, and parallel access to the same file. This approach increases the performance and scalability of the system. Parallelism has been used in some parallel file systems and I/O libraries described in the bibliography (Vesta [4], PIOUS [13], Scotch [8], ParFiSys [3, 7], Galley [14], PVFS [2], Armada [15], and ViPIOS [6]). However, current parallel file systems and parallel I/O libraries lack generality and flexibility for general purpose distributed environments, because these systems do not use standard servers, thus it is very difficult to use these systems in heterogeneous environments as, for example, clusters of workstations. Furthermore, any of them provide fault tolerance features. This paper shows a new approach to the building of parallel file systems for heterogeneous distributed and parallel systems. The result of this approach is a new parallel file system, called Expand (Expandable Parallel File System). Expand allows the transparent use of multiple NFS servers as a single file system. Different NFS servers are combined to create a distributed partition where files are stripped. Expand requires no changes to the NFS server and it uses RPC operations to provide parallel access to the same file. Using this system, we can join different servers (Linux, Solaris, Windows 2000, etc.) to provide parallel access to files in a heterogeneous cluster. Fault tolerance is achieved in Expand using RAID techniques applied to parallel files. The rest of the paper is organized as follows. Section 2 shows systems related with Expand. Section 3 presents the motivations and goals of this work. The Expand design is shown in Section 4, where we describe the data distri-
Proceedings of the Eleventh Euromicro Conference on Parallel,Distributed and Network-Based Processing (Euro-PDP’03) 0-7695-1875-3/03 $17.00 © 2003 IEEE
bution, the structure of the files, the naming and metadata management, the parallel access to files, and the fault tolerance techniques implemented. Performance evaluation is presented in Section 5. The evaluation compares Expand with the performance obtained in PVFS. Finally, Section 6 summarizes our conclusions and the future work.
2. Related Work The use of parallelism in file systems is based on the fact that a distributed and parallel system consists of several nodes with storage devices. The performance and bandwidth can be increased if data accesses are exploited in parallel [20]. Parallelism in file systems is obtained using several independent servers supporting one o more secondary storage devices. Data are stripped among these nodes and devices to allow parallel access to different files, and parallel access to the same file. Initially, this idea was used in RAID (Redundant Array of Inexpensive Disks) [16]. However, when a RAID is used in a traditional file server, the I/O bandwidth is limited by the server memory bandwidth. But, if several servers are used in parallel, the performance can be increased in two ways: 1. Allowing parallel access to different files by using several disks and servers. 2. Stripping data using distributed partitions [3], allowing parallel access to the data of the same file. The use of parallelism in file systems is different, however, to the use of replicated file systems. In a replicated file system, each disk into each server stores a full copy of a file. Using parallel I/O, each disk into each server stores only a part of the file. This approach allows parallel access to the same file. Three different parallel I/O software architectures can be distinguished: application libraries, parallel file systems and intelligent I/O systems. Application libraries basically consist of a set of highly specialized I/O functions. These functions provides a powerful development environment for experts with specific knowledge of the problem to model using this solution. Representative examples are MPI-IO, an I/O extension of the standardized message passing interface MPI, and ADIO [18], a standard API yielding an abstract device interface for portable I/O. Parallel file systems operate independently from the applications, thus allowing more flexibility and generality. Examples of parallel file systems are: CFS [17], SFS [11], Vesta [4], PIOUS [13], Scotch [8], PPFS [9], ParFiSys [3], Galley [14], and PVFS [2]. Finally, an intelligent I/O system hides the physical disk access to the application developer by providing a transparent logical I/O environment. The user describes what he
wants and the system tries to optimize the I/O requests applying optimization techniques. This approach is used in Armada [15], ViPIOS [6], and PPFS [12]. The main problem with application libraries and intelligent I/O systems are that they often lacks generality and flexibility by creating only tailor-made software for specific problems. By the other hand, parallel file systems are specially conceived for multiprocessors and multicomputers, and does not integrate appropriately in general purpose distributed environments as cluster of workstations. Furthermore, any of them provide fault tolerance features.
3. Motivation and Goals The main motivation of this work is to build a parallel file system for heterogeneous general purpose distributed environments with fault tolerance features. To satisfy this goal, authors are designing and implementing a parallel file system using NFS servers. Network File system (NFS) [21] supports the NFS protocol, a set of remote procedure call (RPC) that provides the means for clients to perform operations on a remote file server. This protocol is operating system independent. Developed originally for being used in networks of UNIX systems, it is widely available today in many systems, as LINUX or Windows 2000, two operating systems very frequently used in clusters. Figure 1 shows the architecture of Expand. This architecture shows how multiple NFS servers can be used as a single file system. File data are stripped by Expand among all NFS servers, using blocks of different sizes as stripping unit. Processes in clients use an Expand library to access to the files. Using the former approach offers the following advantages: 1. No changes to the NFS server are required to run Expand. All aspects of Expand operations are implemented on the clients. In this way, we can use several servers with different operating system to build a stripped partition. Furthermore, stripped partitions can coexist with NFS traditional partitions without problems. 2. Expand is independent of the operating system used in the client. All operations are implemented using RPC and NFS protocol. 3. The parallel file system construction is greatly simplified, because all operations are implemented on the clients. This approach is completely different to that used in many current parallel file systems, that implement both client and server sides.
Proceedings of the Eleventh Euromicro Conference on Parallel,Distributed and Network-Based Processing (Euro-PDP’03) 0-7695-1875-3/03 $17.00 © 2003 IEEE
Client
Client
Expand
Expand
Parallel Access NFS Protocol
Server
Server
NFS Server
NFS Server
Distributed Partition
Figure 1. Expand Architecture
4. It allows parallel access to both data of different files and data of the same file, reducing the bottleneck that represents the traditional distributed file systems. 5. It allows the using of servers with different architectures and operating systems, because the NFS protocol hides those differences. Because of this feature, Expand is very suitable for heterogenous systems, as cluster of workstations. 6. It simplifies the configuration, because NFS is very familiar to users. Server only need to export the appropriate directories and clients only need a little configuration file that explain the Expand partition. There are other systems that use NFS servers as basis of their work, similarly to Expand. Bigfoot-NFS [10], for example, also combines multiples NFS servers. However, this system uses files as the unit of interleaving (i.e., all data of a file reside in one server). Although files in a directory might be interleaved across several machines, it does no allow parallel access to the same file. Other similar system is Slice file system [5]. Slice is a storage system for high-speed networks that uses a packet filter proxy to virtualize the NFS, presenting to NFS clients a unified shared file volume. This system uses the proxy to distribute file service requests across a server ensemble and it offers compatibility with file system clients. However, the proxy can be a bottleneck that can affect the global scalability of the system.
4. Expand Design Expand provides high performance I/O exploiting parallel accesses to files stripped among several NFS servers. Expand is designed as a client-server system with multiple NFS servers, with each Expand file striped across some of the NFS servers. All operations in Expand clients are based on RPCs and NFS protocol. The first prototype of Expand is a user-level implementation, through a library that must be linked with the applications. Expand provides a global name space in all cluster. Next sections describe data distribution, file structure, naming, metadata management, parallel access to files, user interface, and fault tolerance techniques.
4.1. Data Distribution To provide large storage capacity and to enhance flexibility, Expand combines several NFS servers to provide a generic stripped partition which allows to create several files. Each server exports one o more directories that are combined to build a stripped partition. All files in the system are stripped across all NFS servers to facilitate parallel access, with each server storing conceptually a subfile of the parallel file. A design goal, no implemented in the current prototype, is to allow that a partition in Expand can be expanded adding more servers to an existing distributed partition.This feature increases the scalability of the system and it also allows to increase the size of partitions. When new servers are added to an existent partition, the partition must be rebuilt to accommodate all files. Conceptually, this rebuilding will be made using the following idea: rebuild_partition(old_partition, new_partition) { for each file in old_partition { copy the file into the new partition unlink the file in old_partition } }
In this algorithm, when a file is copied into the new partition, the new file is automatically stripped across all NFS server of the new partition. To implement this expansion, algorithms must be designed to allow a redistribution of data without integrally copying files.
4.2. The Structure of Expand Files A file in Expand consists in several subfiles, one for each NFS partition. All subfiles are fully transparent to the Expand users. Expand hides this subfiles, offering to the clients a traditional view of the files. This idea is similar to the used in PVFS.
Proceedings of the Eleventh Euromicro Conference on Parallel,Distributed and Network-Based Processing (Euro-PDP’03) 0-7695-1875-3/03 $17.00 © 2003 IEEE
fileA 0
/Expand 1
2
3
4
5
6
7
ServerN
Server1
8
stride size
Dir1
Dir2
NFS Server
NFS Server
Dir3
..............
/export1
Server2
Server1
Server3 Dir1
Dir4
NFS Server
NFS Server
NFS Server
/export2
Dir2
Dir3
Dir1
fileA
Dir3
Dir1
fileA
Dir2
fileB
fileA
fileB
1 4 7
Directory mapping
Figure 3. Directory mapping in Expand
Dir4
fileA
0 3 6
Dir3
Dir3
4.3. Naming and Metadata Management
Active Metadata
2 5 8
Dir2
Dir4
/export3
Dir4
Dir4
fileA
Dir2
Dir1
fileB
Logical user view Dir1
Dir3
Dir4
fileA /export1
Dir2
/exportN
SubFile
Node Master for fileA
Figure 2. Expand file with cyclic layout On a distributed partition, the user can create several types of files:
Stripped files with cyclic layout. In these files, blocks are stripped across the partition following a roundrobin pattern. This structure is shown in Figure 2. Fault tolerant files. These files can use RAID4 or RAID5 schemes to offer fault tolerance. Each subfile of an Expand file (see Figure 2 and Figure 6) has a small header at the beginning of the subfile. This header stores the file’s metadata. This metadata includes the following information:
Stride size. Each file can use a different stride size. This parameter can be specified in open operation. Kind of file: cyclic, RAID4, or RAID5. This parameter also can be specified in the open operation. Base node. This parameter identifies the NFS server where the first block of the file resides. Round-robin pattern. All files in Expand, including RAID4 and RAID5 files, are striped using a roundrobin pattern. Metadata stores the order used to distribute the file across the servers. All subfiles has a header for metadata, although only one node, called master node (described below) stores the current metadata. The master node can be different from the base node
To simplify the naming process and reduce potential bottlenecks, Expand does not use any metadata manager, as the used in PVFS [2]. Figure 3 shows how directory mapping is made in Expand. The Expand tree directory is replicated in all NFS servers. In this way, we can use the lookup operation of NFS without any change to access to all subfiles of a file. This feature also allows access to fault tolerance files when a server node fails. The metadata of a file resides in the header of a subfile stored in a NFS server. This NFS server is the master node of the file, similar to the mechanism used in the Vesta Parallel File System [4]. To obtain the master node of a file, the file name is hashed into the number of node:
hash(namefile) ) NF Sserveri Initially the base node for a file agrees with the master node. The use of this simple scheme allows to distribute the master nodes and the blocks between all NFS servers, balancing the use of all NFS servers and, hence, the I/O load. Because the determination of the master node is based on the file name, when a user renames a file, the master node for this file is changed. The algorithm used in Expand to rename a file is the following: rename(oldname, newname){ oldmaster = hash(oldname) newmaster = hash(newname) move the metadata from oldmaster to newmaster }
This process is shown in Figure 4. Moving the metadata is the only operation needed to maintain the coherence of the base node system for all the Expand files.
Proceedings of the Eleventh Euromicro Conference on Parallel,Distributed and Network-Based Processing (Euro-PDP’03) 0-7695-1875-3/03 $17.00 © 2003 IEEE
Master node
Master node
NFS Server 1
NFS Server 2
NFS Server 1
NFS Server 3
NFS Server 2
Active Metadata
Active Metadata
2 5 8
0 3 6
NFS Server 3
1 4 7
rename(fileA, newFileA)
2 5 8
fileA hash(fileA) = server2
0 3 6
1 4 7
newfileA hash(newfile) = sever3
Figure 4. Renaming a file in Expand
4.4. Parallel Access NFS clients use filehandle to access the files. A NFS filehandle is an opaque reference to a file or directory that is independent of the filename. All NFS operations use a filehandle to identify the file or directory which the operation applies to. Only the server can interpret the data contained within the filehandle. Expand uses a virtual filehandle, that is defined as follows:
virtualF ilehandle =
[N filehandlei
i=1
where filehandlei is the filehandle used for the NFS server i to reference the subfile i belonging to the Expand file. The virtual filehandle is the reference used in Expand to reference all operations. When Expand needs to access to a subfile, it uses the appropriated filehandle. Because filehandles are opaque to clients, Expand can use different NFS implementations for the same distributed partition. To enhance I/O, user requests are split by the Expand library into parallel subrequests sent to the involved NFS servers. When a request involves k NFS servers, Expand issues k requests in parallel to the NFS servers, using threads to parallelize the operations. The same criteria is used in all Expand operations. A parallel operation to k servers is divided in k individual operations that use RPC and the NFS protocol to access the corresponding subfile.
Figure 5. Integration of Expand in ROMIO the RAID concept applied to files. Figure 6 shows a file with a RAID5 configuration. In a file using RAID5, Expand insert transparent parity blocks following a RAID5 pattern. This solution can tolerate a fail in a I/O node or NFS server. Fault tolerance files have the metadata replicated in several subfiles. In Expand, the parity calculation is made on clients, so a lock mechanism is required to ensure the correct order between all clients. To provide portability, we use the Network Lock Manager (NLM), a facility that works in cooperation with the Network File System (NFS). The network lock manager contains both server and client functions. The client functions are responsible for processing requests from the applications and sending requests to the network lock manager at the server. The server functions fileA 0
4.6. Fault Tolerance
3
4
5
7
6
8
stride size Server2
Server1
Expand offers two different interfaces. The first interface is based on POSIX system call. This interface, however, is not appropriate for parallel applications using strided patterns with small access size [14]. Parallel applications can also used Expand with MPI-IO. Expand [1] have been integrated inside ROMIO [19] and can be used with MPICH. Figure 5 shows how the integration of Expand inside ROMIO.
2
1
4.5. User Interface NFS Server
NLM
Dir1
Dir2
NFS Server
NLM
Dir3
Dir1
Dir2
Dir3
Dir1
Dir2
Dir4
fileA
Active Metadata
1 4
0 P3-5 7
Dir1
Dir3
Dir4
Active Metadata
P0-2 Parity Block 3 6
Dir2
fileA
fileA
fileA
NLM /export3
/export3
Dir4
Dir4
NFS Server
NLM
/export2
/export1
Server4
Server3
NFS Server
P6-8
SubFile
2 5 8
Node Master for fileA
An important feature for parallel applications is to obtain fault tolerance. Fault tolerance is provided in Expand using
Figure 6. Expand file with RAID5 layout
Proceedings of the Eleventh Euromicro Conference on Parallel,Distributed and Network-Based Processing (Euro-PDP’03) 0-7695-1875-3/03 $17.00 © 2003 IEEE
SubFile
Dir3
4 clients 40 35 Bandwidth [MB/sec.]
are responsible for accepting lock requests from clients and generating the appropriate locking calls at the server. The server will then respond to the client’s locking request. To increase the performance, clients only lock the parity unit involved in the operation, and to avoid possible bottlenecks, we use several network lock managers, one per I/O node (see figure 6). Each stride or parity unit (block 0, 1, 2, and P0-2, in the former figure) is managed for one NLM in a cyclic pattern. So, when a client wants to lock the first parity unit (blocks 0, 1, 2, and P0-2) use the network lock manager located at the server 1 (see figure 6); when a client wants to lock the second parity unit, use the network lock manager located at the server 2, and so on.
30
PVFS XPN XPN-RAID5 XPN-RAID5(fail)
25 20 15 10 5 0 512
1KB
2KB
4KB
8KB
16KB
32KB
64KB
16KB
32KB
64KB
16KB
32KB
64KB
Access size
4.7. Evaluation
8 clients 60
Bandwidth [MB/sec.]
50
PVFS XPN XPN-RAID5 XPN-RAID5(fail)
40 30 20 10 0 512
1KB
2KB
4KB
8KB
Access size 16 clients 80 70 Bandwidth [MB/sec.]
To evaluate Expand we have used a parallel program that writes a file of 100 MB in an interleaved fashion, using 4, 8 and 16 processes. The whole file is written by all process in an interleaved fashion. The file is written using data access sizes from 512 bytes to 64 KB. The same program is used for reading the file. The result of the evaluation is the aggregated bandwidth obtained when executing the tests. The platform used for the evaluation has been a cluster with 8 Pentium III biprocessors, each one with 1GB of main memory, connected through a Fast Ethernet, and executing Linux operating system (kernel 2.4.5). All experiments have been executed in Expand and PVFS [2]. For Expand and PVFS, a distributed partition with 8 servers have been used. The block size used in all files, both in Expand as PVFS tests, has been 8 KB. All clients have been executed in 8 machines; thus, for 16 processes, 2 are executed in each machine. The programs used in the evaluation use MPI-IO interface, both in PVFS as Expand, and have been executed using MPICH. The NFS server in this cluster use RPC with UDP protocol. Figure 7 shows the aggregated bandwidth obtained for write operations, and Figure 8 shows the same result for the read ones. Expand have been executed in three configurations: a stripped file with cyclic layout and without fault tolerance (XPN), a RAID5 file (XPN-RAID5) with all NFS servers running, and a RAID5 file (XPN-RAID5 fail) with a fail in one NFS server. In this case, clients must rebuild file data using the parity blocks insert in the file. As can be seen in the Figures, performance for read tests is very similar in PVFS and Expand. Only for a file with RAID5 and one fail, the performance is worst, due to the data reconstruction. For write operations, performance is better for PVFS for the larger access size. A possible explanation is the use of TCP in PVFS. The best performance obtained in Expand for short access can be explained for the use of UDP protocol in NFS. A special detail of the evaluation is the poor performance obtained by PVFS in
60
PVFS XPN XPN-RAID5 XPN-RAID5(fail)
50 40 30 20 10 0 512
1KB
2KB
4KB
8KB
Access size
Figure 7. Parallel write with 4, 8 y 16 processes
the cluster used in the evaluation for write operations using short access size (lower than 1 KB). This problem occurs because PVFS distribution does not use by default the TCP NODELAY socket option. In Figure 9 and 10 we examine write and read scaling
Proceedings of the Eleventh Euromicro Conference on Parallel,Distributed and Network-Based Processing (Euro-PDP’03) 0-7695-1875-3/03 $17.00 © 2003 IEEE
4 clients
Bandwidth [MB/sec.]
40 35
16 clients
PVFS XPN XPN-RAID5 XPN-RAID5(fail)
PVFS XPN XPN-RAID5 XPN-RAID5(fail)
50 Bandwidth [MB/sec.]
45
30 25 20 15 10
40 30 20 10
5 0 512
0 1KB
2KB
4KB
8KB
16KB
32KB
64KB
1
4
Access size
8
16
Number of clients
8 clients 80
60
Figure 9. Scalability in read operation with access size of 2 KB.
PVFS XPN XPN-RAID5 XPN-RAID5(fail)
50
16 clients 12
40 30
10 0 512
1KB
2KB
4KB
8KB
16KB
32KB
64KB
Access size 16 clients 90
Bandwidth [MB/sec.]
80
PVFS XPN XPN-RAID5 XPN-RAID5(fail)
10
20
Bandwidth [MB/sec.]
Bandwidth [MB/sec.]
70
8 6 4 2
PVFS XPN XPN-RAID5 XPN-RAID5(fail)
0 1
4
8
16
Number of clients
70 60
Figure 10. Scalability in write operation with access size of 2 KB.
50 40 30 20 10 512
5. Conclusions and Future Work 1KB
2KB
4KB
8KB
16KB
32KB
64KB
Access size
Figure 8. Parallel read with 4, 8 y 16 processes
for 2KB access size. The main result is the good scalability obtained for RAID5 files, both without fail and with a fail in one NFS server.
In this paper we have presented the design of a new parallel file system, named Expand, for clusters of workstations. Expand is built using NFS servers as basis. This solution is very flexible because no changes to the NFS servers are required to run Expand. Furthermore, Expand is also independent of the operating system used in the clients, due to the use of RPC and NFS protocol. Expand combines several NFS servers to provide a distributed partition where files are stripped. Expand improves the scalability of the system because the distributed partitions can be expanded adding more NFS servers to the partition. Fault tolerance is obtained using files using the RAID concept applied to
Proceedings of the Eleventh Euromicro Conference on Parallel,Distributed and Network-Based Processing (Euro-PDP’03) 0-7695-1875-3/03 $17.00 © 2003 IEEE
files. The current prototype of Expand is a user-level implementation, through a library, that offers a POSIX interface and a MPI-IO interface using ROMIO and MPICH.The evaluation has compared the performance of Expand with the obtained in PVFS. The evaluation shows that Expand offers good results, and even for small access sizes and RAID files. Further work is going on to optimize Expand adding a cache to clients and a cache coherence protocol similar to the one described in [7]. We also want to implement Expand for other operating systems, like Windows 2000.
References [1] A. Calderon, F. Garcia, J. Carretero, J. Perez, and J. Fernandez. An Implementation of MPI-IO on Expand: A Parallel File System Based on NFS Servers. Lecture Notes in Computer Science, 2474:306–313, 2002. [2] P. Carns, W. L. III, R. Ross, and R. Takhur. PVFS: A Parallel File System for Linux Clusters. Technical Report ANL/MCS-P804-0400, 2000. [3] J. Carretero, F. Perez, P. de Miguel, F. Garcia, and L. Alonso. Performance Increase Mechanisms for Parallel and Distributed File Systems. Parallel Computing: Special Issue on Parallel I/O Systems. Elsevier, 23:525–542, Apr. 1997. [4] P. Corbett, S. Johnson, and D. Feitelson. Overview of the Vesta Parallel File System. ACM Computer Architecture News, 21(5):7–15, Dec. 1993. [5] J. C. D.C. Anderson and A. M. Vahdat. Interposed request routing for scalable network storage. In Fourth Symposium on Operating System Design and Implementation (OSDI2000), 2000. [6] T. Fuerle, O. Jorns, E. Schikuta, and H. Wanek. MetaViPIOS: Harness Distributed I/O resources with ViPIOS. Journal of Research Computing and Systems, Special Issue on Parallel Computing, 1999. [7] F. Garcia, J. Carretero, F. Perez, P. de Miguel, and L. Alonso. High Performance Cache Management for Parallel File Systems. Lecture Notes in Computer Science, 1573, 1999. [8] G. Gibson. The Scotch Paralell Storage Systems. Technical Report CMU-CS-95-107, Scholl of Computer Science, Carnegie Mellon University, Pittsburbh, Pennsylvania, 1995. [9] J. Huber, C. L. Elford, and et al. PPFS: A High Performance Portable Parallel File System. In Proceedings of the 9th ACM International Conference on Supercomputing, pages 385–394. IEEE, July 1995. [10] G. H. Kim and R. G. Minninch. Bigfoot-NFS: A Parallel File-Striping NFS Server. Technical report, Sun Microsystems Computer Corp., 1994. [11] S. LoVerso, M. Isman, A. Nanopoulos, W. Nesheim, E. Milne, and R. Wheeler. sfs: A Parallel File System for the CM-5. In Proceedings of the 1993 Summer Usenix Conference, pages 291–305, 1993. [12] T. Madhyastha. Automatic Classification of Input/Output Access Patterns. PhD thesis, niversidad de Illinois, UrbanaChampaign, 1997.
[13] S. A. Moyer and V. S. Sunderam. PIOUS: A Scalable Parallel I/O System for Distributed Computing Environments. In Proceedings of the Scalable High-Performance Computing Conferece, pages 71–78, 1994. [14] N. Nieuwejaar and D. Kotz. The Galley Parallel File System. In Proceedings of the 10th ACM International Conference on Supercomputing, May 1996. [15] R. Olfield and D. Kotz. The armada parallel file system, 1998. http://www.cs.dartmouth.edu/ dfk/armada/design.html. [16] D. Patterson, G. Gibson, and R. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of ACM SIGMOD, pages 109–116. ACM, June 1988. [17] P. Pierce. A Concurrent File System for a Highly Parallel Mass Storage Subsystem. In J. L. Gustafson, editor, Proceedings of the Fourth Conference on Hypercubes Concurrent Computers and Applications, pages 155–161. HCCA, Mar. 1989. [18] W. G. R. Takhur and E. Lusk. An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces. In Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation, pages 180–187, Oct. 1996. [19] W. G. R. Takhur and E. Lusk. On Implementing MPI-IO Portably and with High Performance. In of the Sixth Workshop on I/O in Parallel and Distributed Systems, pages 23– 32, 1999. [20] J. d. Rosario, R. Bordawekar, and A. Choudary. Improved Parallel I/O via a Two-phase Run-time Access Strategy. ACM Computer Architecture News, 21(5):31–39, Dec. 1993. [21] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and Implementation of the SUN Network Filesystem. In Proc. of the 1985 USENIX Conference. USENIX, 1985.
Proceedings of the Eleventh Euromicro Conference on Parallel,Distributed and Network-Based Processing (Euro-PDP’03) 0-7695-1875-3/03 $17.00 © 2003 IEEE