demonstrator storage service for GMS-5 [9] satellite im- agery which can ..... 200MB of four-channel 2291x2291-pixel earth imagery per day. We currently ...
Flexible High-Performance Access to Distributed Storage Resources Craig J. Patten and K. A. Hawick Distributed and High-Performance Computing Group, Department of Computer Science, University of Adelaide, Adelaide, SA 5005, Australia cjp,khawick @cs.adelaide.edu.au
Abstract We describe a software architecture for storage services in computational grid environments. Based upon a lightweight message-passing paradigm, the architecture enables the provision and composition of active, distributed storage services. These services can then cooperatively provide access to distributed storage in a manner potentially optimized for dataset and resource environments. We report on the design and implementation of a distributed file system and a dataset-specific satellite imagery service using the architecture. We discuss data movement and storage issues and implications for future work with the architecture.
1. Introduction With the advent of computational grids and commodity clusters, the issues behind distributed data storage, management and access have become more complex. Existing parallel and distributed file systems and hierarchical storage systems were not designed for the widely variable requirements and environments found within such distributed systems. We have designed and implemented a prototype system providing a basic message-passing interface with which portable higher-level storage services can be produced for such environments. The Distributed Active Resource Architecture (DARC) enables the development of portable, extensible and adaptive storage services which can be optimized for their target user requirements and which are independent of the underlying storage abstraction and end-user interface. Using the DARC architecture, we have developed a proof-of-concept distributed file system, Strata1 , which can 1 Strata,
meaning divisions or layers within an organized system.
replicate data across other DARC resources. Interfacing with both of the above systems, we have constructed a demonstrator storage service for GMS-5 [9] satellite imagery which can actively prefetch/replicate information in a dataset-optimized manner. In this paper we describe the design and implementation of the aforementioned systems and discuss our initial experimental results and plans for future work.
2. DARC Architecture The central element in the design of DARC is the Node. A Node is a daemon on each host which wishes to participate in the architecture, and is the initial point of contact for establishing access to that host. Nodes also provide communications infrastructure between other local and remote DARC elements through a message-passing interface, inspired somewhat by the V Kernel [3]. Another important DARC element is the DataResource. These are entities which, through the local Node, provide remote access to some data storage and/or access mechanism. There are two types of DataResources. Firstly, those that simply represent some fixed, local storage media, for example a hierarchical storage management system. Secondly, higher-level, mobile DataResources which are constructed to utilise other fixed or indeed mobile DataResources. These higher-level DataResources can function as single, centralized servers, similar to, for example, Network File System [19, 2] servers. However, the DARC architecture, shown in Figure 1, enables and was indeed intended to be used for the production of distributed, cooperative data services. These collectively harness multiple other resources across potentially wide-area networks. It is for this reason that DataResources are not only active on the “server-side” of data operations, but rather provide a distributed pres-
Host A
Data Resource Data Resource
Data Resource
Node
Node
TCP
Data Resource
Node Host B
Data Resource Data Resource Host C
Figure 1. The Distributed Active Resource Architecture, illustrating the TCP mesh over which data (Xfer) and metadata/control information (MetaXfer) messages travel.
ence at all data production and consumption points to improve flexibility and performance. This non-server-centric paradigm is also being used in other “smart” systems in recent literature [24, 28, 22, 23]. When a client wishes to access some remote data service, its Node initiates a local instantiation of the specified DataResource. This entity then communicates with its remote peer instances, and potentially with other local DataResources. For example, a disk cache could provide service to the local client. This mechanism allows DataResources to provide services optimized for both the dataset and the particular circumstances of the client. For example, these could utilize other “nearby” DataResources for replication and prefetching, or could perform datasetspecific optimizations such as data distillation and refinement for clients with minimal bandwidth [5]. A DataResource can also execute arbitrary data transformations or even generate data “on-the-fly”, in a similar vein to Sprite’s Pseudo-Devices [26] and the Semantic File System [6]. The scheduling and security ramifications of this functionality are areas which we have yet to fully focus on. We believe resource and privilege limits placed upon DataResources constitute a satisfactory initial solution. Obtaining high-performance with bulk data transfers, especially across wide-area networks, has been shown [21] to be fraught with difficulty. This implies that direct appli-
cation involvement with network performance optimization is both non-trivial and potentially wasted effort as hardware technology evolves. Our architecture therefore provides a generic abstraction for DataResource communications, through the Node. Using this facility, DataResource instances of both the the same and different types can transfer data and metadata between their instances without delving into communications hardware- and protocol-specific optimizations. This abstraction which Nodes provide for bulk data transfers is called an Xfer, and for metadata transfers, a MetaXfer. An Xfer provides a mechanism for a DataResource to transfer information to or from another. This information consists of the requested data’s specification and source and destination host and resource identifiers. Xfer requests can be produced to execute transfers between remote DataResources, and indeed can trivially be used to generate parallel-I/O style data movements, such as scatter/gather and broadcast. When passed an Xfer or a MetaXfer object which is specified as being “sent” from a remote location, a Node will transport it to the “sending” DataResource. This resource must then, barring failure, execute the transfer. The issue of access to bulk data over wide-area networks is also being addressed by GASS [1], part of the Globus [4] project. Whereas their focus is on mechanisms and optimizations for prefetching and poststaging complete files to and from HPC resources, ours is upon a flexible, more general architecture for composing services to efficiently utilize distributed storage resources. The Storage Resource Broker (SRB) [13] with its partner Meta Information Catalog (MCAT) [12] also provides access to large-scale, potentially distributed data. Clients specify dataset attributes rather than explicit location information, and these attributes are then matched with metadata stored in a database on an MCAT server, to furnish access to the requested data. However, we are taking the approach of embedding metadata and associated operations and behaviour within code - mobile, active DataResources.
3. DARC Implementation Our choice of implementation platform for this work was Java [7], primarily because it provides the portability and dynamic binding we rely upon for instantiating DataResources across heterogeneous grid environments. Java systems do exhibit some well-known disadvantages such as a non-negligible memory footprint and suboptimal performance relative to native code. However, these are decreasing over time with advances in compiler and virtual machine technology. DataResources must support a Java interface through which Xfer, MetaXfer and other objects may pass. The dynamic instantiation of DataResources is facilitated by a
relatively simple custom ClassLoader which we have implemented. Requesting a Node to “host” a DataResource involves informing it of the location of another Node which already executes the resource. The bootstrap case is handled with a simple client program which acts as a Node. The remote DataResource then “injects” itself into the local virtual machine and is locally instantiated. It is then free to communicate with other local and remote DataResources to provide its services and use those of others. These communications are generally intended to occur through the local Node, but this is not a rule. Whilst flexible, Xfers, MetaXfers and the mobility mechanism are obviously open to abuse by malicious DataResources. We have yet to address the security issues regarding which DataResources can request transfers between which other resources, for example. We do however envision utilizing a public-key infrastructure for Xfer and MetaXfer encryption and signing. Current Java implementations also provide facilities for restricting system resource usage, such as confining access to certain directory subtrees within local filesystems. We are currently examining the utility of this approach. As stated earlier, network performance optimization across WANs is difficult, and a full discussion of the relevant issues is beyond the scope of this paper. When examining various potential implementation mechanisms for the Xfer/MetaXfer transport functionality, the Remote Method Invocation (RMI) feature of Java initially appeared to constitute a readily-usable solution due to its RPC-like simplicity. However, in our implementation, Nodes communicate Xfer and MetaXfer objects through parallel, persistent TCP connections. Upon the first communication between two Nodes, the initiator listens to a newly-allocated port and uses RMI to request the remote Node connect to that port. This is similar to the operation of the FTP [16] protocol. This connection can then be used for future fullduplex communication of both Xfers, with their associated data, and MetaXfers between Nodes. In the case where a Node wishes to connect to another but finds all of its current connections to that host either in use or terminated, it initiates a new connection to minimise latency. Unbounded, this could be wasteful and cause network congestion, so we have implemented a simple throttling mechanism to limit the numbers of open connections. The reasons we have chosen this communications system are similar to the motivations for persistent HTTP connections [11]. Initiating RMI requests for each transfer would incur a performance hit due to both the round-trip latency, which can be especially large in wide-area grid systems, the TCP slow-start mechanism, and RMI protocolspecific overheads. Our solution was relatively straightforward to implement, and intuitively decreases the impact of these overheads.
The MetaXfer abstraction is implemented as an interface class. DataResources can use the basic implementation we have provided to transfer “generic” metadata information. If this is not rich enough or if “internal” metadata transfers are required between resource instances, Java’s interface mechanism allows for the straightforward implementation of custom MetaXfers.
4. DARC Service: Strata To investigate and evaluate the above architecture we have designed and implemented a distributed file system prototype using DARC. This system, known as Strata, whilst not yet providing a complete filesystem interface, does present a single, location-transparent directory hierarchy and also supports file and directory replication. Strata uses a local filesystem DataResource for storage, and uses DARC for transfers of both data (Xfer) and control information (MetaXfer) between instances. The top-level of Strata’s distributed architecture is the “root” directory. This is globally replicated across all hosts in the same Strata set2 and is centrally managed by one Strata instance. Whilst initially this seems to be a fundamental flaw in a system designed to potentially operate across wide-area networks, the centralization impediment is alleviated by the use of the group membership algorithm described by Ricciardi and Birman in [18]. Whilst a broadcast algorithm, in the logical rather than physical sense in our implementation, it enables relatively lightweight, lowcomplexity group reconfiguration upon perceived process failure. Thus in the assumedly rare case where the root directory manager is believed to have failed by a majority of other Strata instances, another host will assume the managerial position and the original manager will be removed from the group. Whilst the use of this algorithm makes updates to the root directory expensive, we assume that such operations do not occur often. This group membership protocol is also used for maintaining replication information for all other files and directories within Strata. Our assumptions here are that for mutable files and directories there will not be a large number of replicas and that for all files or directories, the group membership flux will not be high. Within all directories, including the root directory, pathname components are hashed to group identities. A group identity maps to the group, one of which exists for each file or directory, engaged to track the Strata instances which manage its replicas. The membership of each entity’s group consists of all Strata nodes which host replicas of that entity and all Strata nodes which host the entities’ parent directories’ replicas. The parent direc2 A Strata set is the group of Strata instances which share the same root manager.
tory nodes are required to be “honorary” members so they can themselves track the entry and exit of actual replicas. When a host wishes to instantiate a Strata presence for the purposes of either offering a storage resource or for accessing a set of Strata nodes, it must use the code mobility mechanism described in Section 2. All Strata operations from remote hosts access any offered storage through the local Strata instance, as do locally-requested operations destined for remote hosts. Our current implementation is limited to only the lookup, create, mkdir, read, write and readdir filesystem operations. To explain how these operations work with the group mechanism, let us first examine lookup. When a lookup operation is executed upon an absolute pathname, the local Strata recursively consults potentially remote directories and groups to locate a replica of the parent directory of the given entity. This mechanism is similar to the prefix tables used in Sprite’s distributed file system [25]. The replica, once located, returns the information to the caller. Optimal replica location is a problem for which we have only implemented a cursory solution, evaluating replica “distance” by examining the network portion of IP addresses [17]. However, our work could easily leverage or interface with other more sophisticated “network weather” and network-aware systems and techniques produced elsewhere [27, 21]. If a Strata instance wishes to replicate a file or directory, it simply uses the group membership protocol to join that entity’s group as a full member. Strata also exposes a minimal replication API. However as we do not facilitate write-sharing, we have not incorporated a conflict resolution algorithm such as described in [20]. At time of writing we currently limit file replication to the granularity of whole files. More flexible replication through the creation of multiple replica groups per file is under implementation. We believe this extra overhead is acceptable assuming file replica fragmentation is minimal, i.e. fewer replicas storing larger file portions rather than the converse. File and directory modification is currently restricted to the manager of the group for that entity. The initial manager of a group is its creator; after that, a Strata host must both (a) join the group as a full replica and (b) initiate a group reconfiguration to establish itself as manager, if it wishes to modify the entity. When a write is performed on the manager’s replica, the write operation is sent to all replicas of that file or directory. The Node transport layer sends these operations through a FIFO channel per host to maintain their ordering. Using the group membership algorithm for this process simplifies the inner workings of Strata. The view-versioning mechanism intrinsic to the group membership algorithm also aids in our crash recovery prototype for Strata, the full details of which are beyond the scope of this paper.
Our initial client interface to Strata is Java RMI, however we are interfacing a dynamically extensible NFS server produced in our previous work [14, 15] to enable “standard” file system access. Both of these interfaces exhibit overheads such as redundant data copying overhead through using TCP or UDP between processes on the same host and are far from optimal. We are also examining a Vnode [10] interface within operating systems such as FreeBSD and Linux. This would yield potentially major performance improvements by avoiding the overheads inherent in loopback-NFS/RMI interfaces, at the cost of some portability. It is important to note that the DARC architecture itself places no restrictions on possible interfaces, apart from the requirement that they must have the ability to communicate with DataResources through the standard Xfer and MetaXfer mechanism described earlier.
5. DARC Service: GMS-5 Satellite Imagery We have implemented a basic DataResource demonstrator to provide access to a GMS-5 satellite image repository. The Japanese GMS-5 satellite produces approximately 200MB of four-channel 2291x2291-pixel earth imagery per day. We currently maintain an archive of the last few years’ imagery. Archive access is provided through a GMS-5 DataResource, and thence Strata and DARC. This allows us to abstract over the location of the underlying GMS-5 data which is stored by Strata, apart from when we wish to issue Strata explicit replication instructions based upon datasetspecific knowledge. Such knowledge includes geographical and temporal locality, for example. This approach has proven much more attractive to users of our satellite data archive than our previous cgi-bin mechanism [8].
6. Experiments and Results We have conducted latency and bandwidth experiments with DARC and Strata to examine the overheads involved in the architecture. We used Java 1.2.2 on both Linux 2.2.515smp and Solaris 2.6. Note that Sun’s Linux/Intel Java Virtual Machine (JVM) does not feature Just-In-Time (JIT) compilation. The associated performance detriment due to this fact remains unquantified, though local operations (lookup shown below) are quite slow on the Linux/Intel JVM relative to the Solaris/SPARC JVM for potentially this reason. We examined the cost of performing a lookup from the Java RMI client: to a local Strata instance; one directory hop to a remote instance on the same LAN; and one directory hop to the Australian National University (ANU) across AARNet23 , the Australian Academic Research Net3 ICMP
roundtrip time =˜ 23ms.
work. The results are shown in Tables 1-3, respectively. In the remote cases, the actual number of directory hops on either host made negligible impact on performance. The network hop itself was the dominating factor. These measurements were performed after the persistent TCP connections were created between Nodes, and thus do not include any connection establishment overhead between remote hosts. They do however include RMI overhead between the client and DARC on the local host. Platform Dual Intel P-III (500 MHz) (no JIT) Sun U5 (440 MHz) (JIT) Sun UE450 (2x300 MHz) (JIT)
Time 27ms 3ms 4ms
Table 1. Wall time to execute a local lookup.
Platforms 2 x Dual Intel P-III (500 MHz)
Network 100BaseT
Time 47ms
2 x Dual Intel P-III (500 MHz)
1000BaseT (Intel/Fiber) ATM ELAN
39ms
Sun U5 (333 MHz) to Sun U5 (440 MHz)
99ms
Table 2. Wall time to execute a lookup one directory hop away on the same LAN.
Platforms Sun UE3000 (2x250 MHz) to Sun U5 (440 MHz)
Time 159ms
Table 3. Wall time to execute a lookup one directory hop away over the AARNet2 WAN.
The relative magnitudes of the remote lookup figures and actual network latencies as measured using ICMP (ping) demonstrates that there are improvements to be made. We may examine using UDP for small control messages, however this would complicate DARC, which currently relies on TCP for reliable communication. Even so, the actual impact of the above lookup times on real workloads would likely not be as large as in a protocol with a higher number of round-trips, such as NFS. For example, in Strata, readdir operations incur the same latency costs as lookup, since a directory hashtable encapsulates the directory entries themselves, thus rendering extra lookups unnecessary.
Our measurements of bandwidth utilization between Strata instances are shown in Table 4. Across the 100BaseT LAN, ATM ELAN and AARNet2 WAN networks tested, the mean bandwidth Strata received was indistinguishable from that obtained with ttcp, a simple bandwidth measurement tool. The bandwidth utilisation across the 1000BaseT LAN, which was largely independent of send/receive buffer sizes, was poor for both DARC/Strata and ttcp. However, this is not surprising given the immense workload which attempting to saturate such a highspeed network presents to the host computers, and we will examine the scalability of our system performance as the underlying gigabit drivers and technology improve. Platforms 2 x Dual Intel P-III (500 MHz) 100BaseT 2 x Dual Intel P-III (500 MHz) 1000BaseT (Intel/Fiber) Sun U5 (440 MHz) Sun U5 (333 MHz) ATM ELAN Sun U5 (440 MHz) Sun UE3000 (2x250 MHz) AARNet2 WAN
ttcp 11.2 MB/s 29.1 MB/s 10.9 MB/s
Strata 11.2 MB/s 16.7 MB/s 10.9 MB/s
206 KB/s
206 KB/s
Table 4. Received ttcp and inter-Strata Xfer bandwidth, both using 64KB TCP send/receive buffers.
The above latency and bandwidth performance measurements do not, of course, necessarily translate into the performance of our systems when tasked with realistic workloads. Such performance will also heavily depend on the design, implementation and composition of the specific DataResources used with DARC. As an attempt to demonstrate the potential utility of the active nature of DataResources, we implemented our GMS-5 imagery retrieval service to prefetch imagery files to the local host when multiple timesequential images were requested by a client. This then potentially aids the performance of time-series processing across the dataset by overlapping computation and communication. Additionally, since the prefetching of imagery by the GMS-5 DataResource is actually replication performed by Strata, prefetched imagery is therefore automatically available at a presumably lower cost to network-proximate hosts using the GMS-5/Strata DataResources. Example timing results from three time-series processings of 100 sequential GMS-5 satellite images appear in Table 5. The client machine, a Sun Ultra 5 (440 MHz), operated upon each 5MB image for a mean time of 35s. The imagery was retrieved from a remote Sun Ultra Enterprise
3000 (2x250 MHz) located across AARNet2 at ANU. In the first case, the client simply iterated through a retrieveprocess loop. Next, another Ultra 5 (330 MHz) on the same ATM ELAN as the client joined the Strata set and was used by GMS-5/Strata for one-image-ahead prefetching/replication of the data, once it established from the first two images that a sequential time-series process was potentially taking place. Finally, the client Ultra 5 performed the same prefetching/replication using its own local disk. The table shows the total experiment time minus image processing time, and the average received bandwidth figures for the client’s GMS-5 DataResource object, both including and excluding the initial two non-prefetched images. Note that the ATM ELAN was slightly loaded by production traffic, providing slightly less bandwidth than when the bandwidth measurements in Table 4 were taken. Additionally, disk caching on the client Ultra 5 is responsible for its large replicate bandwidth figure. Whilst a simple example, this experiment does indicate the potentially large performance improvements to be made when using DARC to flexibly leverage resources such as storage and communications. Prefetching/ Replication None Nearby Local
Retrieval Time (s) 2508 50 + 64 50 + 4
Average Bandwidth 204.4 KB/s 4.4 MB/s 9.3 MB/s
Replicate Bandwidth 7.7 MB/s 122.6 MB/s
Table 5. Timing results for GMS-5 satellite imagery retrieval and processing example.
7. Conclusions and the Future We have presented a prototype system design and implementation for composing distributed data storage services. We believe DARC is flexible since it enables the separation of underlying storage mechanisms, dataset policies and client interfaces in a portable manner. Our preliminary performance results are encouraging but show that there is room for improvement, which we are currently working on. However, the results also show that systems built on top of our architecture can efficiently exploit bandwidth across local and wide-area networks. Our conclusion on the use of Java as the storage resource middleware is that overall, it was the right choice for our requirements. Alongside the performance optimizations and workload investigation, we are also examining security and performance prediction technologies to leverage in development of DARC. We are also looking at future DataResource designs and implementations to investigate the possibilities of
using a message-passing storage architecture such as DARC in modern HPC clusters and wide-area grid environments.
8. Acknowledgements We acknowledge the support provided by the Research Data Networks and Advanced Computational Systems Cooperative Research Centres (CRC) established under the Australian Government’s CRC Program.
References [1] J. Bester, I. Foster, C. Kesselman, J. Tedesco, and S. Tuecke. GASS: A Data Movement and Access Service for Wide Area Computing Systems. In Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems, May 1999. [2] B. Callaghan, B. Pawlowski, and P. Staubach. RFC 1813: NFS Version 3 Protocol Specification, June 1995. See also RFC1094 [19]. [3] D. R. Cheriton. The V Kernel: A Software Based for Distributed Systems. IEEE Software, 1(2):19–42, Apr. 1984. [4] I. Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, 1996. [5] A. Fox. A Framework For Separating Server Scalability and Availability from Internet Application Functionality. PhD thesis, University of California, Berkeley, 1998. [6] D. K. Gifford, P. Jouvelot, M. A. Sheldon, and J. W. O. Jr. Semantic File Systems. Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, Oct. 1991. [7] J. Gosling, B. Joy, and G. Steele. The Java Language Specification. The Java Series. Addison Wesley Longman, 1996. ISBN 0-201-63451-1. [8] H. A. James and K. A. Hawick. A Web-based Interface for On-Demand Processing of Satellite Imagery Archives. In Proceedings of the Australian Computer Science Conference (ACSC) ’98, Perth, WA, Australia, Feb. 1998. [9] Japanese Meteorological Satellite Center, Tokyo, Japan. The GMS User’s Guide, second edition, 1989. [10] S. R. Kleiman. Vnodes: An Architecture for Multiple File System Types in Sun UNIX. In Proceedings of the Summer USENIX Conference, pages 238–247, Atlanta, GA, USA, June 1986. [11] J. C. Mogul. The Case for Persistent-Connection HTTP. In Proceedings of ACM SIGCOMM’95 Conference, Cambridge, MA, USA, Aug. 1995. [12] National Partnership for Advanced Computational Infrastructure. Meta Information Catalog (MCAT). Available at http://www.npaci.edu/DICE/SRB/mcat.html. [13] National Partnership for Advanced Computational Infrastructure. Storage Resource Broker (SRB). Available at http://www.npaci.edu/DICE/SRB. [14] C. J. Patten, K. A. Hawick, and J. F. Hercus. Towards a Scalable Metacomputing Storage Service. In Proceedings of the Seventh International Conference on High Performance Computing and Networking Europe, Amsterdam, The Netherlands, Apr. 1999.
[15] C. J. Patten, F. A. Vaughan, K. A. Hawick, and A. L. Brown. DWorFS: File System Support for Legacy Applications in DISCWorld. In Proceedings of the Fifth Integrated Data Environments Australia Workshop, Fremantle, Australia, Feb. 1998. [16] J. Postel and J. Reynolds. RFC 959: File Transfer Protocol, Oct. 1985. [17] R. Renesse, Y. Minsky, and M. Hayden. A Gossip-Style Failure Detection Service. Technical Report TR98-1687, Cornell University Computer Science Department, June 1998. [18] A. M. Ricciardi and K. P. Birman. Using Process Groups to Implement Failure Detection in Asynchronous Environments. In Proceedings of the Tenth ACM Symposium on Principles of Distributed Computing, pages 341–353, 1991. [19] Sun Microsystems, Inc. RFC 1094: NFS: Network File System Protocol specification, Mar. 1989. See also RFC1813 [2]. [20] D. Terry, M. Theimer, K. Petersen, A. Demers, M. Spreitzer, and C. H. Hauser. Managing Update Conflict in Bayou, a Weakly Connected Replicated Storage System. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, Copper Mountain Resort, CO, USA, Dec. 1995. [21] B. Tierney, J. Lee, B. Crowley, M. Holding, J. Hylton, and F. Drake. A Network-Aware Distributed Storage Cache for Data Intensive Environments. In Proceedings of Eighth IEEE High Performance Distributed Computing Conference, Redondo Beach, CA, USA, Aug. 1999. [22] A. Vahdat, T. Anderson, M. Dahlin, E. Belani, D. Culler, P. Eastham, and C. Yoshikawa. WebOS: Operating System Services For Wide Area Applications. In Proceedings of the Seventh IEEE Symposium on High Performance Distributed Computing, Chicago, IL, USA, July 1998. [23] A. Vahdat, M. Dahlin, T. Anderson, and A. Aggarwal. Active Names: Flexible Location and Transport of Wide-Area Resources. In Proceedings of the Second USENIX Symposium on Internet Technologies and Systems, Boulder, CO, USA, Oct. 1999. [24] J. B. Weissman. Smart File Objects: A Remote File Access Paradigm. In Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, pages 89– 97, Atlanta, GA, May 1999. [25] B. B. Welch and J. K. Ousterhout. Prefix Tables: A Simple Mechanism for Locating Files in a Distributed File System. In Proceedings of the Sixth International Conference on Distributed Computing Systems, May 1986. [26] B. B. Welch and J. K. Ousterhout. Pseudo-Devices: UserLevel Extensions to the Sprite File System. In Proceedings of the Summer USENIX Conference, June 1988. [27] R. Wolski, N. Spring, and C. Peterson. Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service. In Proceedings of SC’97: High Performance Networking and Computing, San Jose, CA, USA, 1997. [28] C. Yoshikawa, B. Chun, P. Eastham, A. Vahdat, T. Anderson, and D. Culler. Using Smart Clients to Build Scalable Services. In Proceedings of the USENIX Technical Conference, Anaheim, CA, USA, Jan. 1997.